, , ,

A manifesto for liberating data

My book, “Data Dynamite: how liberating information will transform our world,” is in print!

Because I argue in the book that liberating data can have the same transformative effect that Martin Luther had translating the Bible into vernacular German and printing it instead of copying it, I ended the book with my variation on Luther’s 95 Theses, a 13-point Liberating Data Manifesto, as it were, that will continue to be valid even if the underlying technologies change dramatically, as they surely will.

Demanding them as the basis of a data liberation initiative should speed the revolution. I’d love your feedback on any and all!

1. Structure data immediately as it is entered. Attaching metadata transforms words and numbers into valuable information. It also means that data doesn’t have to be constantly re-entered.

This is the single most important step to assure the benefits of data sharing are fully realized. When data is structured — that is, metadata that explains it is attached or mapped to it — the data is given context and is transformed from mere words and into valuable information that can be automatically accessed by any programs and/or machines that recognize that metadata.

That means the data doesn’t need to be re-entered, with the resulting risk of error, higher costs, and longer time. Whenever possible, the goal should be a system in which data only needs be entered once and is automatically propagated everywhere. Benefits will include streamlining, coordinating, and automating operations by providing real-time, machine-readable data.

2. Make data freely available unless there are substantive security and privacy concerns.

The most fundamental change to facilitate liberating data is to make the default presumption in dealing with government and corporate data that it should be shared by those whose roles depend on access to it, rather than assuming it should be kept locked in data bases and access limited.
Of course there will be legitimate security and privacy reasons requiring that some data be kept private, but those objections must be legitimate, rather than simply “We’ve always done it that way.”

3. Unless there are compelling reasons to the contrary, make data available on a real-time, data-in-data-out basis.

Historical data is, of course, important, and should be readily available. 

However, data is most valuable when provided on a real-time basis. That is when it can most improve decision-making and analysis by grounding us in current reality. Now that there are relatively low-cost analysis dashboards that will make ubiquitous business intelligence a reality, it is more likely that a growing number of companies and government agencies will begin to provide real-time data streams to their entire workforces, with the exact level of detail and confidential data depending on each individual’s role and level of security clearance.

4. Don’t charge for data unless there is a compelling reason to do so.

Data wants to be free, literally and figuratively. In the case of government data, our taxes or fees already paid for its collection, and frequently it is individuals’ lives that are the basis for this data. In the past, when entering and retrieving the data was costly and time-consuming, charging for access to data was justifiable. Today it is not.

In the UK and other countries, sale of data, especially geospatial data, is a significant source of income. That should not serve as justification for continuing to charge for data: the potential revenue loss for government is more than offset by potential tax revenues from new services that can be created because of the data.

One positive sign: after the US Open Government Directive was announced, one valuable Health and Human Services Medicare data file that had previously cost $100 and was only available on CD-ROM became available free of charge on the Web.

5. Make organizations data-centric, with data at the core of their operations and strategy, and with access for all employees determined by their roles.

To fully capitalize on the benefits of access to data, the organization must regard data as at the heart of everything they do. That requires restructuring and managing operations so that data is central, rather than on the periphery: a data-centric organization revolves around its data.

6. Provide tools to make data understandable and meaningful.

Data is only really valuable when users are able to easily work with it and experiment with it, trying a variety of visualization styles to see which is most illustrative and sharing it with others using Web 2.0 tools. 

The “Tools” section of the U.S. Data.gov site, which combines data extraction tools with a variety of widgets, is a good start. However, the best example so far is ManyEyes.com, because it combines a variety of data visualization tools with social media tools such as threaded discussions that will encourage collaborative analysis. Government and corporate sites may be reluctant to host freewheeling discussions of data, so this may remain the sites such as Many Eyes.

7.Protect security and privacy with unified data — it is simpler than with multiple, fragmentary files.

Experts say that unifying data and emphasizing structured data that just only has to be entered once makes it simpler to protect security and privacy than with multiple, fragmentary records. Allowing only permission-based access to parts of these records based on an individual’s role and security clearance is much easier to regulate and limit than trying to supervise a wide range of fragmentary databases with varying levels of security.

8. Make the public and employees trusted partners in using and generating data.

For too long, employees and the public have been given only the data crumbs, not full access, and were discouraged from actively become involved in data use and collection. When employees and the public are elevated to the level of full partners in analysis and use of this data, organizations and society will gain the benefit of their individual interests and life experiences. 
All workers, not just a few élites, should be treated as “knowledge workers” and given access to valuable, real-time data that will help them do their jobs more efficiently.

9. Build a “data culture,” beginning in schools, in which the general public and workforces will be comfortable working with data and have the skills needed to do so.

If people have access to data but don’t know how to use it, the data will be wasted. Beginning in the early years of education, expose children to using new data visualization tools and data feeds as part of the curriculum. In the workplace, provide tutorials and opportunity for workers to experiment with data to build their confidence and ability to use it.

10. Encourage data-centric public policy debate with active involvement from all political perspectives. Encourage discussion of the data, not just its interpretation.

In the political process, widespread use of, and debate about, data might reduce the level of rancor and make the political process more fact-based and realistic. People and parties on all sides of an issue should be encouraged to monitor data and incorporate it in their deliberations.
Debating not just the interpretation of the data but also factors such as assumptions that might color the data itself will improve data quality over time.

11. Encourage global adoption of data-centric governmental reporting processes to simplify and reduce businesses’ costs while facilitating interagency review and improved regulation and public protection.

Tools such as XBRL are free and have been adopted as global standards. Multinational companies that already must report using these standards in one or more countries will be able to amortize their costs if all nations and local governmental entities adopt the same standards.

Allowing companies to file a single structured data file instead of the multiplicity of traditional reports with individual agencies will let them save money while simultaneously improving the quality of regulation, because several agencies can coordinate reviews for the first time. For regulatory reports that are released to the public and corporations, this form of reporting will also allow users to make better comparisons because the reporting standards will be identical for every company.

12. Encourage rapid spread of liberating data’s benefits through open-source and crowdsourced applications.

Liberating data has already spread rapidly in part because pioneers have released a variety of apps and widgets created with open source tools, which lowers the cost of creating them. The solutions they create are themselves open source, explicitly inviting others to imitate and improve on the first ones. In this way, global adoption of liberating data will accelerate and poorer governments and their citizens as well as small businesses and individuals will be able to enjoy the benefits as well.

13. Streamline, coordinate, and automate operations by providing real-time, machine-readable data.

Data that is in “machine-readable,” structured formats can automate and drive machinery and devices. This can be a particular boon to businesses, which will be better able to streamline and optimize their operations, while also leading to more devices similar to GPS ones that are automatically updated based on real-time information.

Leave a Comment

Leave a comment

Leave a Reply