Where’s Australia’s back-up for governments shutting down access to open data?

On a regular basis, around the world, governments rise and fall.

We see this most commonly at local levels – with councils merging and demerging, however it also occurs at a slower rate at state and national levels, with new nations created out of the ruins of older states on a regular basis.

I’ve been thinking a great deal about this over the last week. Ever since the US Government, the richest and most powerful state in history, told 800,000 staff – about a third of their public service – to stay at home until further notice.

The result of this shutdown hasn’t been limited to the shuttering of national parks and monuments, or a reduction in services to the public.

Significant online data sources have also been shutdown, including Census.gov, which can have a major flow-on impact to businesses and the public.

In Australia, where it has been difficult for a hostile opposition to block the Australian Government’s budget supply since the events of 1975, we’re not really familiar with the notion of governments abruptly shutting down – although we do see frequent mergers and demergers at council level and the appearance and disappearance of agencies at state and federal levels on a regular basis (we lost at least four Australian Government agencies following the last election).

Some of these decisions are taken very quickly, and can have major impacts on businesses reliant on government programs or data.

As the open data revolution progresses more and more companies will come to rely on government data to power their activities with the public. At the same time the public will also come to rely on this data, and the hackers and companies that make use of it, for the services that they use in their normal lives.

So where’s the back-up to government if it suddenly shuts down access to its data?

This view appears to be shared by the Sunlight Foundation, whose Eric Mills recently wrote a great post on the topic, Government APIs Aren’t A Backup Plan.

In the US not-for-profit civic groups are beginning to replicate data released by government as a risk-mitigation step – such as this great list of non-government government data sources compiled by Code for America: http://forever.codeforamerica.org/Census-API/shutdown-2013.html

In Australia this hasn’t happened as yet – but it could, relatively easily.

All it would require is a couple of different cloud-based data storage environments (for redundancy), a good front-end data catalogue and appropriate crawlers and volunteers who source and update data as it is released.

We’re already part-way there with the creation of GovPond during the last GovHack. Developed in Perth, originally as a way to locate open data for state-level GovHack participants (from the dark and dusty corners of the internet), GovPond has become a fantastic resource for finding data across the plethora of Australian government data catalogues, without the incredibly messy business of checking each site.

GovPond provides the front-end data catalogue for Australian government – without all the messy politics between and within jurisdictions who each feel the need to have their own ‘central’ data catalogues and then undermine them by storing open data on agency sites and not listing it centrally.

The second part, cloud-based storage, is already cheaply available and is already used by some government open data sites. For example Data.gov.au made the sensible step of storing data on Amazon’s system – overcoming all the security concerns with the simple fact that the data is designed to be publicly accessible.

Other agencies and states have employed a range of approaches – with much of their data still stored on servers they pay significant amounts of money to own (now that’s a real waste of government funds where the data is supposed to be publicly available) – however the ability to access low-cost and high resilience cloud storage is definitely there.

The final step is the tough one – coordinating the volunteers and designing the scrapers that find, copy, file and maintain government data from the thousands of government websites across Australia.

Some of this work has been done. Volunteers compiled GovPond and adding tools that check currency is very possible within the context of the site. Many government open data sites have moved to standard platforms like CKAN, which simplify copying and maintenance of data (although the vast bulk of available government data still sits outside these platforms).

Much remains to be done. There needs to be some structure or organisation that commits itself to recruiting, supporting and empowering these volunteers, sourcing the funds necessary to pay for data storage and some technical tools to maintain data.

There needs to be leadership from within the open data community – beyond the leadership that already exists (and is largely committed to other goals).

Finally there needs to be the interest and willingness within the broader Australian public and business community to support this approach. This interest will grow as government data becomes more mission-critical for certain businesses and for the public, making it logical for them to invest in ensuring that the data remains available to them when they need it.

When it comes to open data, the public, companies and even government agencies need access to the data – they don’t need the data to necessarily be held in government hands.

As we move through the process of releasing more data and it becomes more valuable to the community, the ability for a single public servant, politician or party to suddenly cut-off access to a dataset, series or service, becomes more of a risk for the community.

As a result there will be a rising interest in having an Australian back-up to government holding open data – possibly many back-ups, stored in a peer-based approach across many servers redundantly to prevent its destruction or loss of access.

In the US they’re there now – seeking to build alternatives to government data storage, as governments are no longer stable and reliable custodians of data. In Australia it’s unlikely to be far away.