The euphoria in the open government community has been palpable since the White House released a new Executive Order titled, Making Open and Machine Readable the New Default for Government Information, and an accompanying new Open Data Policy - Managing Information as an Asset. Just reading the recent tweets on the Open Data Policy will renew your sense of optimism about Government. Combined, these documents change the default position for Government Agencies to be creating and managing data (structured information) in open, machine readable formats. Overall, the policy is really well thought out, and better yet, integrates into existing enforcement mechanisms set up by GPRA and other statutes, vice creating yet another reporting approach. There is a ton to love about this policy, and those working on it should be commended for putting out a comprehensive and innovative approach toward moving the Federal Government toward releasing open data.
One of the cornerstones of this policy involves creating both an enterprise data inventory, and then creating a public data listing of that inventory. It states
a. Create and maintain an enterprise data inventory- Agencies must update their inventory
of agency information resources (as required by OMB Circular A-130)24 to include an
enterprise data inventory, if it does not already exist, that accounts for datasets used in the
agency' s information systems. The inventory will be built out over time, with the ultimate
goal of including all agency datasets, to the extent practicable. The inventory will indicate, as
appropriate, if the agency has determined that the individual datasets may be made publicly
available (i.e., release is permitted by law, subject to all privacy, confidentiality, security, and
other valid requirements) and whether they are currently available to the public. The Senior
Agency Official for Records Management should be consulted on integration with the records
management process. Agencies should use the Data Reference Model from the Federal
Enterprise Architecture to help create and maintain their inventory. Agencies must describe
datasets within the inventory using the common core and extensible metadata (see part III,
b. Create and maintain a public data listing- Any datasets in the agency's enterprise data
inventory that can be made publicly available must be listed at www.[agency].gov/data in a
human- and machine-readable format that enables automatic aggregation by Data.gov and
other services (known as "harvestable files"), to the extent practicable. This should include
datasets that can be made publicly available but have not yet been released. This public data
listing should also include, to the extent permitted by law and existing terms and conditions,
datasets that were produced through agency-funded grants, contracts, and cooperative
agreements (excluding any data submitted primarily for the purpose of contract monitoring
and administration), and, where feasible, be accompanied by standard citation information,
preferably in the form of a persistent identifier. The public data listing will be built out over
time, with the ultimate goal of including all agency datasets that can be made publicly
available. See Project Open Data for best practices, tools, and schema to implement the
public data listing and harvestable files.
The goal here is superb - the idea that by providing an public listing of possible datasets, that the government and interested citizen groups can engage in constructive dialogue to determine priorities and start releasing the most critical datasets earlier rather than later. I couldn't agree more with the goal, and in fact, agree that the end state listed would be the coming of the "Age of Data Nirvana". Apple pie, Chevrolet and Open Data sets become the foundation of Americana - who doesn't love that!
Just realize that creating and maintaining an enterprise data inventory is a massive undertaking. This is not a minor addition to an agency's information resources (and for the love of all that is good, please don't use the FEA DRM as your method for finding the data sets! This becomes code for giving expensive contractors lots or money for little results. Use your portfolio of IT systems as a starting point instead.). For the larger agencies this assumes they have already gotten a robust portfolio management process in place (which if true, means we've already uncovered all redundant IT expenditures and are well on our way toward eliminating them), they know the functions and interconnections of their IT systems, and know their expected outputs (Note to readers: There's still work to be done before all this is accomplished). After this miracle occurs, its just a matter of translating these outputs into viable datasets, and then going through the process of getting the relevant parts of the Agency to agree to release the information. To effectively execute this policy, an Agency must devote real resources to create an enterprise data repository, start the public data listing, create a process for engaging with customers, develop communication strategies, engage with entrepreneurs and innovators in the private and nonprofit sectors all while ensuring privacy and national security interests are protected.
Unfortunately, as we live in the age of Sequester, chances are Federal IT shops will be more interested in fixing existing operational problems than seeking out and exposing open data. For an extreme example, is it more important for the VA to fix the problem of sending veterans' their proper benefits, or releasing datasets exposing in clear detail the current delay in providing the benefits? As no new dollars are provided for this effort, it wouldn't be a stretch to envision a time in the not too distant future where folks like Sunlight are chastising the Administration for falling behind on their commitment to open data. The time before there is a solid public data listing of even half of the potential data sets per agency may be a long way off.
If This Takes A Long Time, What Can We Do Until Then? If in fact it will take a long time to get to decent quality public data listings, what can we do in the meantime? It may be more effective in the short term to start the citizen engagement process first, well before any of their public data sets are online. What if the Agency Data officers started engaging with the public now on the sets of priorites and needs they see today? Would they be able to take this information and ferret out the information systems which relate to those needs? If so, they might be able to start the process of opening up a valuable data set right way, vice waiting potentially years for a more complete set prior to starting a robust conversation. Taking a citizen engagement approach such as this would mean that Project Open Data should also start listing online and offline strategies for facilitating conversations with interest groups to identify key concerns which might lead to relevant data sets. Starting the citizen engagement process now would also maintain visibility for this effort after the #opengov euphoria has faded.
The concern here of course is that some Agencies may decide this is too hard to do, or won't have the resources to even begin the task. The strategy for getting around this would be easy - an Agency could just stick an earnest person with the task of finding and listing a few potential datasets they know about, which would be relatively easy to release, and then would ask the public which of those listed would they like to see? The Agency could claim with justification that they are making a good faith effort with limited resources to meet the goals of the policy, and that they see improvements in this process right around the not too distant future, which never quite seems to arrive in the present ("We're making progress!").
Again, my goal isn't to rain on the parade of happiness breaking out. The policy is terrific. I would just point out that the operational details of implementing this may be harder than some anticipate. I'm sure there are other strategies to get started earlier, perhaps people will list some in the comments section.
DISCLAIMER: As a clear disclaimer this post is solely my opinion as an independent consultant who has worked in multiple Federal Agencies, and is not at all related to any specific Agency or task I have worked or am working.