Originally posted on “osrin.net“, on 9th November 2009
The current trend of governments releasing massive and diverse datasets will demand something different from internet search tools in the future, something that we might consider a little extraordinary today.
Today most of us search in a single dimension, we tap a term into our favourite web search tool and get back a list of links that represent pages that are currently published somewhere on the net. Most of us are not planning on doing any level of analysis on that information, we are just trying to find something, so the list of links are enough for us.
Governments are starting a new trend though, massive amounts of machine readable data that we can use to draw our own conclusions to complex questions about our environment or our society.
In his now infamous TED presentation, “Let my dataset change your mindset”, Hans Rosling gave us a preview of the way that many of us will be using these government datasets in years to come, along with similar datasets that we will eventually see commercial organizations publishing in the same way.
Using available data, developers will continue to build new applications that could never have been funded by government, citizens and businesses will be able to offer complex and well thought out advice to policy makers, economists will be able to build empirical models that demonstrate societal trends and eventually historians will reconstruct the environment that we leave behind.
For all of this to work internet search has to evolve, a list of links won’t meet our needs. Here’s three examples;
First of all, a piece that we’re close today, we need to be able to search by geography. When we begin to break down massive datasets the geography becomes important, any piece of data has a special meaning when we can tie it to a country, a county, a town or a particular street. Most of the government policy makers I meet have had a long term understanding of the role of geographic data in government process, but few tools exist to enable the publishing of that data externally in a way that is useful.
Secondly, we need to be able to search by timeframe. Future analysis of data, either for an economist constructing trends over a limited number of years or long term reviews by historians will require us to find a way to roll datasets back to a point in time that is relevant to the users analysis.
Finally, it is not enough for a single country to solve this, international standards need to evolve to support this type of search.
Very quickly we will find ourselves at a point where it will not be enough for us to look at an issue in the context of a single country. In the short term, policy advice to a given government could be enhanced by the ability to cross analyze that advice with data from similar nations – e.g. to lower the cost of building a kilometre of road in New Zealand, I might also want to look at the costs in the UK, Canada and Australia. – and in centuries to come historians will need a way to show how global society evolved.
We are not far from a point where we are going to see a need to enable a software instigated search for data relating to a particular issue, in a certain place and during a given timeframe.
It is then that we will really begin to experience the power that published data gives us.