Michael J. Bommarito II of Computational Legal Studies has posted Building a Better Legal Search Engine, Part 1: Searching the U.S. Code. Here is a summary:
[…] The first part in my blog series leading up to this talk will focus on indexing and searching the U.S. Code with structured, public domain data and open source software. […] In this part of the post series, I’m going to build an index of the text of the Code from the 2009 and 2010 LRC snapshots. To do this, we’ll use the excellent Apache Lucene library for Java.[…] Stay tuned next week for the next part in the series. I’ll be using Apache Mahout to build an intelligent recommender system and cluster the sections of the Code.