Publishing Government Data That Developers Will Actually Use

Despite increasing public support (as well as a number of executive mandates) publishing public data in a machine-readable format is not as simple as pressing the “publish” button. Why? Equally important as exposing the information itself is fostering a vibrant developer ecosystem around it. By making the publishing agency, not the public, responsible for making information immediately useful, government can lower the barriers associated with consuming its data and introduce additional citizen services at little to no cost to the agency.

1. Garbage in, garbage out. Good, clean data may be surprisingly difficult to come by, especially when working with government systems that have been coupled together over decades. Data standards and conventions change, mechanisms of data collection evolve, and the data itself may be interpreted differently as new policies are introduced. As a result consistent practices, like naming conventions or data formats, often go overlooked. Where practical, take steps to normalize the data prior to release, rather than pushing the responsibility off to be inefficiently repeated by each application individually.

2. Eat your own dog food. When organizations consume the products they create, they empirically deliver better, more reliable, and more innovative products. You’d never seek to buy a car from a dealer that’s never driven one, yet we often expect the public to build applications based on APIs (Application Programming Interfaces – how computers talk to one another) published by organizations that have never had to consume their own data. Rather than solving the same problem twice, start by exposing all relevant data through public APIs and then work backward to build internal applications that rely on those externally facing data feeds.

3. Data as a citizen service. It is tempting to try and meet open data benchmarks, at least on face, by publishing snapshots of large datasets. Yet multi-gigabyte database exports do little to encourage external development, especially when such data-dumps are delayed and infrequent. Imagine the usefulness of a Facebook feed that showed your friends’ activity from last month. Datasets should be directly exposed so that the public has access to live, real-time data, either in its entirety, or through proper access controls. This not only allows agencies to deliver more useful information, but also reduces the need to store the same data in multiple formats and in multiple locations.

4. Curate discrete pieces of data. APIs are most useful when they do the heavy lifting for those consuming them, especially in terms of sifting through large amounts of data. In practical terms that means returning data to the most discrete level possible, be it a single row, rather than merely returning a subset of the dataset, or even returning a single cell. Seemingly obvious but often overlooked, a query for the broadband speeds at a given address, for example, should return only the data relating to that address, not the entire city or even state-wide dataset. By allowing developers to query the data directly that means they will need less development time on their end, and thus a higher likelihood that an application will be built.

5. Serve data in multiple formats. When providing a service, whether you are a waiter or a CIO, “the customer is always right.” In the context of APIs, that means you need to return the information in the developers’ native tongue, not the server’s. For some languages, heavyweight methods like XML may make sense, for others, especially mobile applications, JSON or JSONP may be preferred. Be prepared to return data in multiple formats, even as those formats continue to evolve.

Continue reading Publishing Government Data That Developers Will Actually Use by Benjamin J. Balter

Leave a Comment


Leave a Reply


Great post – I also think important to look at where there is demand. Heard an interesting presentation from FixYourStreets founder who as he builds citizen apps he looks at google search trends to see where the real pain point is for citizens. For example, for our jobs.govloop.com we are building on USAJOBS data – government jobs data is really popular with folks. Just like if building an API, a company would spend a lot of time figuring out what fields people really want

Henry Brown

Would offer that 2. Eat your own dog food, will in fact drive the others toward the point where the data will become meaningful to the “customer

Chris Cairns

That’s some good stuff. Getting to “open data” the right (and useful) way isn’t going to be easy. I like your use of the keyword “dogfooding,” by the way.