Open data is data that delivers results

I struck a nerve around open data, as I mentioned in my earlier article, when I stated that “XML is simply a markup language, a container for data. Is it one of the most preferred containers? Absolutely. However, open government data is not synonymous with XML. Open government data is simply government-owned data that can be mined in order to create useful information. It can be in XML, PDF, text files, print outs, etc… The key point is that the data is being released for others to use to create value from it, not the format that it is released in”.

Initial comments on twitter argued that open data had to be XML, then opened up to being any open, non-proprietary format. For developers I would absolutely agree that this makes sense. It’s much easier for developers to work with open formats like CSV and XML vs. proprietary formats like PDF. Developers, however, are not the leaders of open government.

Lovisa Williams said it well in her comment when say stated:

With the issuance of the Open Government Directive, http://www.whitehouse.gov/sites/default/files/omb/assets/memoranda_2010/m10-06.pdf, agencies are being asked to provide more of the data we collect or process. The format is not specified except to say we should use an open format. The interesting thing is when you read the footnote about what an open format is, OMB says “it is any communication or representation of knowledge such as facts, data or opinions presented in any medium or format”. This could mean, as John has said, the use of PDF files and others. It can also mean (although OMB prefers we don’t do this) we could publish data using more traditional methods such as print publications. The concept is for Government to show citizens, in a transparent way, how their government works and what they are doing with tax payer funds. Most citizens I know aren’t tech or code suave. In fact, I’m sure most of them would just prefer we’d put our information online in a searchable document library“.

From a US-centric view, the only thing that matters is that agencies must share more of their data in, as noted by OMB, any medium or format. The goal of opening up data is not to convert government into a data platform, the goal is to make government more transparent and to reduce internal operating costs. We should be applauding any effort to open up government data, in any format.

Government data can be defined in various ways, of course, but I tend to think about it in this way:

In other words:

  • Data that changes frequently and is of high value in terms of operational savings or potential market opportunities needs to be high priority. This data should be made available in open data formats like XML, CSV, etc….
  • Data that changes frequently but is of low value should be a low priority. If this is only available through FOIA or paper print-outs so be it. While not ideal we MUST invest our precious resources where the payback is largest.
  • Data that never changes, or changes infrequently, falls into the same bucket as low value data above. While it would be nice to spend time putting it into open data formats it is not a priority.

Remember also that it is difficult to create automation to generate XML data feeds. Simple for developers but difficult for everyone else. On the other hand, generating data in Word Documents, PDF, or other proprietary formats is trivial for nearly any office worker. The plus side of proprietary formats like PDF is that anyone can read these documents, in fact, they can read them easier than XML documents. XML data is primarily consumed by application types such as :

The power of XML, and other open formats is in the applications that can be created, the markets formed around new businesses, job creation. If we could afford to convert every piece of data we own into this format, without giving up work on other priorities, I would absolutely support it. However, in the world we live in we are resource constrained and choices must be made.

XML is the right choice for open data in some cases. PDF is the right choice for open data in other cases. In the eyes of the recipient they have received the data they requested, mission accomplished.


Originally posted on Government in Action.

Matthew Micene

The distinction needs to be raised between transport and delivery. Delivery of data to end users should never accept XML as an end result; it simply isn’t human consumable. Published data should be in some static (if only until refreshed in a web browser) format like PDF, HTML or print. But the underlying communication of data to the publishing point needs to be in an open non-proprietary and easily consumable format like XML. The problem is conflating the process and the product.

And while developers are not the leaders of open government, it should be their job to educate the leaders on the technology. An “searchable document library” does not presuppose any design paradigms or data formats.

Apologies to Marshall McLuhan, but the medium is most definitely not the message here.