, , ,

“A Tapestry of Data”: Open Legislation with The State Decoded

The State Decoded is a proposed open government data platform — currently in development — aimed at providing free online access in interoperable formats to U.S. state codes, and, where possible, at connecting such codes to pending legislation and court decisions. On June 22, a Knight News Challenge grant was awarded for The State Decoded to Waldo Jaquith, founder and developer of the project, and Web Developer at the Miller Center at the University of Virginia. (Click here a video in which Mr. Jaquith describes the project in connection with the award.) In early July I spoke with Mr. Jaquith about the technology and policy considerations that inform The State Decoded. The following is an account of our interview.

Overview, Core Functions, and User-Friendliness

The State Decoded is an open source software platform designed to make U.S. state legislative codes publicly available in open, interoperable digital formats, and, where possible, to connect those codes with proposed legislation and judicial decisions. Mr. Jaquith said that functionality to connect state codes with regulations is not included in the initial version of the platform, but might be included in subsequent versions.

The State Decoded has three principal functions: (1) display legislation so as to improve access to and understanding of the law by citizens, journalists, and researchers; (2) make data contained in state legislative codes available to automated systems through an open application programming interface (API); and (3) make that data available in bulk for download, so that researchers and developers can combine that data with other data, and so that developers can create new systems around that data.

Mr. Jaquith said that the State Decoded user interface would incorporate three features intended to make the law easier for citizens to understand: automatically displaying official definitions of statutory terms; where possible, integrating “plain language” definitions and explanations — obtained from “law for the layperson” materials provided by public interest legal organizations — of legal language with the display of primary legal materials; and displaying the text of legislation in “beautiful typography.”

“A Tapestry of Data”

Two attributes in particular characterize the relationship of The State Decoded platform to the wider information environment: aggregation of data from multiple sources, and open-endedness. According to Mr. Jaquith, data for The State Decoded would come from several sources. State codes would come either from the current online publishers of those codes — whether state legislatures or commercial publishers — or from open government data sources such as OregonLaws.org. Pending legislation and related data — such as information on legislators’ voting records — come from the Sunlight Foundation‘s Open States Project and GovKit. Court decisions may come from the state reporters of decisions, or from open sources such as Public.Resource.Org‘s RECOP service.

In addition to aggregating information from multiple sources, The State Decoded is in at least two respects an “open ended” project, Mr. Jaquith explained. First, when The State Decoded platform is operating in a state, respecting the system’s API and bulk access functions, data will pass through the platform to a variety of actors, including “journalists, researchers,” developers, and automated systems. Second, The State Decoded will be offered as an open source platform for use by diverse individuals and organizations, who will be free to modify the platform, experiment with its functionality, and mashup its data with a variety of other systems and information in ways that can’t be predicted. “Part of what’s interesting in this,” Mr. Jaquith observed, “is that I don’t know what people are going to do with” the system or the data made available through it. In light of the variety of data sources and the open-ended nature of the project, Mr. Jaquith uses the metaphor of a “tapestry of data” to describe the open government data environment to which he hopes The State Decoded will contribute. The platform will form “part of an open government tapestry, that we’re all weaving together,” he said.

Motivations

The open-ended nature of The State Decoded is echoed in Mr. Jaquith’s account of his motivations for pursuing this project. Asked about his motivations, Mr. Jaquith replied, “I’m doing this because I want this to exist. […] I want to use systems like this. […] I want this kind of data to exist. […] Then I want others to step up; and then I can pass this along to them.”

Adopters

What kinds of individuals and organizations are likely to volunteer to set up a State Decoded platform for their state? Mr. Jaquith reported that, so far, he’s received expressions of interest in hosting the platform from three types of individuals or organizations (located in “at least half a dozen” different states): newspapers, who seek to expand the provision of primary legal data via their Websites or mobile platforms; organizations that advocate for open government; and individual “open government enthusiasts” who want to contribute to the availability of government information by publishing law online.

Technology

In describing The State Decoded‘s technology, Mr. Jaquith said that some of his approach derived from his experience in building the Richmond Sunlight open legislative platform. Mr. Jaquith explained that the architecture of The State Decoded consists entirely of open source components. The code is written in PHP — because it’s “widely understood and used” and Mr. Jaquith has many years of experience with it — data are stored in a MySQL database, Apache Web server software is used, and all run under the Linux operating system.

The platform will be licensed under the GNU General Public License (GPL), and made available on GitHub.

According to Mr. Jaquith, data stored natively in the State Decoded MySQL database are not encoded in XML or any other structural metadata format. The State Decoded API — a RESTful API — outputs data in JSON, XML, and the EPUB open ebook format. (Mr. Jaquith said that he was inspired to offer EPUB functionality after seeing CALI‘s Free Law Reporter, which outputs EPUB versions of court decisions.) Respecting content negotiation, Mr. Jaquith said that the platform’s “architecture supported it,” and that he was considering enabling output of data in “PDF, plain text, and EPUB” formats, as well as in HTML.

Respecting Uniform Resource Identifiers (URIs), Mr. Jaquith said, “I’m a huge fan of URI as command line.” He said that legislative URIs in The State Decoded would have a human- and machine-readable structure modeled on legal citations, in which the name of the legislative source would be followed by title and section numbers, separated by slashes. In some states in which the citation format for legislation omits the chapter number, chapter numbers would be omitted from the URIs. Mr. Jaquith said that the purpose of this URI format was to “maximize discoverability and improve ranking.” This approach to URIs seems similar to Rick Jeliffe’s PRESTO approach, which influenced the structure of URIs adopted by John Sheridan and Jeni Tennison in the Legislation.gov.uk system.

Mr. Jaquith said he planned for The State Decoded to enable bulk downloading of data, but that he was uncertain about the optimal format for bulk data, and would welcome suggestions. I noted that Robinson et al., in their influential article, Government Data and the Invisible Hand, recommend encoding bulk data in XML, because XML is an open standard that enables interoperability, XML is widely used among developers who work with government or legal data, and many programmers know how to create new systems around XML data.

Mr. Jaquith said that, although open publication of state codes is the primary objective of The State Decoded, the system will enable the “interfac[ing of] every one of these state codes with state legislative and court decision data, where available.” The system will accomplish this through “a simple screenscraper-based system,” as well as by receiving data through “others’ legislative and court APIs.”

Respecting updating of legislation, Mr. Jaquith said that The State Decoded‘s “import function” for proposed and newly enacted legislation invokes parsers that detect code provisions that would be affected by such legislation. This enables automatic updating of codified statutes.

Asked whether The State Decoded included point-in-time functionality for legislation, Mr. Jaquith said that it did not. He noted, however, that his Richmond Sunlight system has point-in-time functionality for Virginia statutes, and said that adding point-in-time functionality for every state was an aspirational goal for The State Decoded. He observed that the substantial differences in structure among the various state codes was an obstacle to building a system that enabled point-in-time functionality for the legislation of every state.

I asked Mr. Jaquith whether The State Decoded enabled authentication of legislative or judicial data. He said that authentication functionality could be included in the platform, but was not included at this time, because few or no states currently authenticate their online legal data. We discussed the U.S. Government Printing Office’s (GPO’s) infrastructure for authenticating federal primary legal materials as a possible model for the states. We also discussed the Uniform Electronic Legal Material Act (UELMA) — scheduled to be considered for approval by the National Conference of Commissioners on Uniform State Laws (NCCUSL) shortly after our interview — which would require enacting states to authenticate legal information they publish online.

Principles: Open Government Data and the FCC’s “Information Needs of Communities” Report

Mr. Jaquith said that his approach in developing The State Decoded was generally consistent with the principles of the Open Government Data movement. He said that the design of The State Decoded as a readily usable platform for civil society information providers also resembled one goal of the Law.gov legal open government data movement: that of establishing specifications for an open source “turnkey platform” with which governments could publish free legal information online. Respecting the U.S. Federal Communications Commission‘s recent report on news media, “The Information Needs of Communities” — which identifies a recent decline in U.S. news coverage of state and local government affairs, and calls on nonprofit and new media organizations to fill this information gap — Mr. Jaquith agreed that The State Decoded sought in part to fulfill one dimension of this information need: the public’s need for information about the laws that govern them.

“It’s not very sexy, putting state codes online,” he admitted. But making state legislation and court decisions more accessible via the Web enhances the public information environment by “mak[ing] it possible for individuals to refer” directly to primary law, and “allow[ing] journalists to check the laws themselves,” rather than having to rely on lawyers or information intermediaries. In particular, Mr. Jaquith emphasized the potential value of The State Decoded for scholars and the press. The platform, he said, “will allow journalists and researchers, to take that data [and] to magnify it” in ways that serve the public interest.

Contacts and Staying Current

Mr. Jaquith welcomes input on The State Decoded; he can be contacted here. To stay up-to-date on the progress of The State Decoded, follow the project on Twitter at @StateDecoded, or sign up for the project mailing list at the project’s Website.

Leave a Comment

One Comment

Leave a Reply