Alex Howard shared a good story today about BrightScope, a San-Diego based company that “quantitatively rates 401k plans and gives participants, plan sponsors, asset managers, and advisors tools to make their plans better”. BrightScope has used government data to create a profitable business. Now, as Alex notes:
“Here’s the key point about the founders’ fascinating story of extracting public data from the Department of Labor (DOL): it came from more than 50 Freedom of Information Act requests, made at significant cost over many months. They extracted boxes and boxes of paper records. Finally, after a lobbying campaign that took months, they were able to get the data in electronic form.”
Yes, the cost of requesting and extracting this data was high but the business is PROFITABLE. This business only exists because of this open government data. Could this have been done more cheaply, at a greater level of profit, if the data had been available through APIs of some sort? Absolutely, which is a part of the reason that states, such as California, are opening up this data into formats that can be more easily extracted by developers, at cheaper costs. These reduced costs lower the barrier of entry and make it inviting for startups to get involved.
Now, Alex was kind enough to school me in his opinion about open data in this Tweet:
“@JohnFMoore #Opendata != XML? OK. You’re on a different wavelength on than @cjoh @noneck or @TimBerners_Lee & others I spoke to at #IOGDC.”
When I disagreed he followed it up with:
“FYI: @TimBerners_Lee (& many of the world’s experts) prefer open linked data, not just XML: http://bit.ly/cKmpHn #IOGDC #opengov”
It was an honor to be given a lesson in front of Sir Tim Berners-Lee.
However, as a former CTO and developer I happen to know a little bit about technology. XML is simply a markup language, a container for data. Is it one of the most preferred containers? Absolutely. However, open government data is not synonymous with XML. Open government data is simply government-owned data that can be mined in order to create useful information. It can be in XML, PDF, text files, print outs, etc… The key point is that the data is being released for others to use to create value from it, not the format that it is released in.
Lets move past the foolish debates on definitions and focus on what matters most. Making government more efficient, more open, and leveraging the data that government owns to help power a new marketplace that generates jobs, lots of jobs.
Originally posted on Government in Action.
I agree with you and humbly disagree with Howard. A large amount of government data still exists in paper, Access files, Excel spreadsheets, and word processing formats that you can barely emulate anymore. XML is great for structured data but what about the vital knowledge that exists in unstructured documents? Yes, it would be simpler for the entrepreneurs if all government data is in easy-to-access formats.
But it isn’t and that is just the reality. Even so, the company that invents a way to mine that unstructured data gets a double bonus: the data in a usable format AND a process for mining data they can sell to other companies with similar needs. As you say, release the data now and let the entrepreneurs figure out how to extract the value from it.
Thanks Bill, agree with you completely. If possible, would love to also have you copy/paste the comment at the original location too:
Definitely okay if you don’t, however.