Research Data’s Collective Action Problem

The open secret is that data is the key to the next wave of innovation for all sectors of society. This argument was made very convincingly in a recent post here on GovLoop.

This is especially true when it comes to research data, which is not only important for scientific discovery, but also provides direct benefits to the public in the form of disease prevention and physical infrastructure. The vast majority of research data is funded by government, generated by academia, and then applied by public or private institutions. There are seemingly endless possibilities for application in the coming years, from healthcare to education.

However, when it comes to the management of this data, the question of “who pays?” poses a serious risk to the long-term sustainability of this valuable resource.

In essence, what we have is a new kind of collective action problem, in which there are many stakeholders from a variety of sectors. Each entity has a specific role and function, but none “own” the responsibility for paying for the long-term storage and maintenance of the resultant data. Without a clear owner, the risk is that these data will simply be lost, preventing anyone from using it in the future.

One of the leading champions of this issue is Francine Berman, the Edward P. Hamilton Distinguished Professor in Computer Science at Rensselaer Polytechnic Institute and chair of the Research Data Alliance (U.S.).

In a recent panel on today’s data economy, Dr. Berman explained that we don’t have this kind of problem with data produced in the private sector because those who generate, use, house, and pay for the data often reside under the same organizational umbrella. This is not the case for federally funded research data.

Establishing a Coordinated Division of Labor

Given that we’re dealing with a multitude of different parties, Dr. Berman argues that we need to establish a coordinated division of labor so that the onus is not on one sector. This will then provide options to best distribute the responsibilities and rewards of supporting this data among the various stakeholders involved.

In a 2013 paper published in Science magazine, Dr. Berman offers a number of suggestions to facilitate the process of coordination. I will highlight two that provide some of the best opportunities for public officials and policymakers to lead the effort:

Facilitate private-sector stewardship of public access to research data.

This means incentives. Through tax credits and other methods, the government can encourage the private sector to store, host, and provide access to research data. According to Dr. Berman, a great example is the private sector’s support for other public goods such as the arts and public space.

Create and clarify public-sector stewardship commitments for public access to research data.

There are certain areas where the public sector has taken the lead in storing and managing research data. However, the sheer volume of data produced cannot be reasonably managed, and paid for, solely by public agencies. Therefore, Dr. Berman recommends that the public sector clearly communicate its priorities, time commitments, and resources so that all stakeholders better understand which data are protected and which are at risk. The ultimate goal here is to encourage other parties to step up and provide stewardship, especially if they have a stake in the data’s preservation.

In light of what we know about the importance of open access to big data resources, especially research data, the public sector is in a prime position to take the lead in helping bridge the coordination gap. The future of innovation depends on our ability to protect this vital resource.


For the full text reprint of Francine Berman and Vint Cerf’s paper, click here.

Leave a Comment

Leave a comment

Leave a Reply