Raw data isn’t always the same as government transparency, said speakers and audience members during a May 25 session at the Gov 2.0 Expo in Washington, D.C.
Said one audience member, “there are lies, damned lies, statistics and government data.”
John Sheridan, head of e-Services and Strategy at the United Kingdom’s Office of Public Sector Information and co-chair of the W3C’s eGovernment Interest Group, said it’s important that agencies consider whether or not they are publishing data responsibly.
“If I put a CSV file with some statistics on the web, that’s great. But often there’s a piece of context–maybe we changed the way we calculated the figure between two sets between 2002 and 2003. Maybe there’s a particular piece of prominence associated with the data in terms of how it was collected or which agency collected it,” said Sheridan. That information should be incorporated into the data in some way.
In addition to context, proper formatting is often overlooked. Office 2007 can only support 65,000 rows of data, but most CSV files on data.gov are bigger than that, said Clay Johnson, director of the Sunlight Foundation’s Sunlight Labs.
Of course, if an agency cleans up the data too much it may appear that it’s been massaged. A truly open agency, according to Johnson, looks like a pyramid. The base of the pyramid is raw, bulk data in whatever format it comes in. The next level is data scrubbed to the point where developers can deal with it. This often means adding an API. The final layer is the creation of a website so that ordinary citizens can get to the data.
“If you’ve managed to do all of those things–the wholesaler experience, the distributor experience and the retailer experience–then you’ve got the full stack,” said Johnson.
Creating usable data may be harder then just posting a CSV file, but it will be possible to create applications from it, and make applications that are more timely and factually accurate. Johnson said it’s bad idea for an organization to make a positive or negative assessment on data, but there may be a need for a data rating service to determine what data is updated on a timely basis, generally bug-free or would pass an XML validator.
Kundra: One year later, developers ensure data.gov is more than data dump
Data.gov not living up to expectations
Open government plans mostly mediocre, says watchdog
OMB sees yellow on open government plans
Coburn: Federal transparency efforts fall short
Federal CIOs aren’t sure Obama administration IT goals add value
USASPENDING.GOV needs some serious corrections
Fed CIO Roadmaps – many don’t have. Those who have published some are incomplete and wrong.
Federal IT Dashboard tells incomplete story. 50% of the IT assets are classified under “minor” investment and they do not get exposed into the dashboard.