From open data to useful data

At BarCamp Canberra on Saturday I led a discussion asking how we can help governments take the step from open data (releasing raw datasets – not always in an easily reusable format) towards usable and useful data (releasing raw datasets in easily reusable formats plus tools that can be used to visualise it).

To frame this discussion I like to think of open data as a form of online community, one that largely involves numbers rather than words.

Organisations that establish a word-based community using a forum, blog, wiki, facebook page or similar online channel but fail to provide context as to how and why people should engage, or feed and participate in the discussion, are likely to get either receive little engagement or have their engagement spin out of control.

Equally I believe that raw data released without context as to how and why people should engage and no data visualisation tools to aid participation in a data discussion are likely to experience the same fate.

With no context and no leadership from the data providers, others will fill the informational gap – sometimes maliciously. Also there’s less opportunities for the data providers to use the data to tell good stories – how crime has decreased, how vaccination reduces fatalities, how the government’s expenditure on social services is delivering good outcomes.

Certainly there will always be some people with the technical experience and commitment to take raw open data, transform it into a usable form and then build a visualisation or mash-up around it to tell a story.

However these people represent a tiny minority in the community. They need a combination of skill, interest and time. I estimate they make up less than 5% of society, possibly well under 1%.

To attract the interest and involvement of others, the barriers to participation must be extremely low, the lesson taught by Facebook and Twitter, and the ability to get a useful outcome with minimal personal effort must be very high, the lesson taught by Google.

The discussion on the weekend seemed to crystalise into two groups. One that felt that governments needed to do more to ‘raise the bar’ on the data they released – expending additional effort to ensure it was more usable and useful for the public.

The other view was that governments have
fulfilled their transparency and accountability goals by releasing
data to the community. That further working on the data redirects
government funds from vital services and activities and that there is
little or no evidence of value in doing further work on open data (beyond releasing it in whatever form the government holds it).

I think there’s some truth in both views – however also some major perceptual holes.

I don’t think it necessarily needs to be government expending the additional effort. With appropriate philanthropical funding a not-for-profit organisation could help bridge the gap between open and usable data, taking what the government releases and reprocessing it into outputs that tell stories.

However I also don’t accept the view that there was no evidence to suggest that there was value in doing further work on open data to make datasets more usable.

In fact it could be that doing this work adds immense value in certain cases. Without sufficient research and evidence to deny this, this is an opinion not a fact – although the evidence I’ve seen from the ABS through the census program (here’s my personal infographic by the way), suggests that they achieved enormous awareness and increased understanding by doing more than releasing tables of numbers – using visualisations to make the numbers come alive.

Indeed there is also other evidence of the value of taking raw data and doing more work to it is worthwhile in a number of situations. Train and bus timetables are an example. Why does government not simply release these as raw data and have commercial entities produce the timetables at a profit? Clearly there must be sufficient value in their production to justify governments producing slick and visual timetables and route maps.

Some may argue that this is service delivery, not open data (as someone did in the discussion). I personally cannot see the difference. Whenever government chooses to add value to data it is doing so to deliver some form of service – whatever the data happens to be.

Is there greater service delivery utility in producing timetables (where commercial entities would step in if government did not) or in providing a visual guide to government budgets (where commercial interests would not step in)?

Either way the goal is to make the data more useful and usable to people. If anything the government should focus its funds on data where commercial interests are not prepared to do the job.

However this is still talking around the nub of the matter – open data is not helping many people because openness doesn’t mean usefulness or usable.

I believe we need either a government agency or a not-for-profit organisation to short circuit the debate and provide evidence of how data can be meaningful with context and visualisations.

Now, who would like to help me put together a not-for-profit to do this?

