“It would be false to say that people are happily sharing their data with us.” That’s how the Department of Defense’s Col. Bob Saxon explained his agency’s efforts to open up government data. For all of its potential benefits, open data also presents security risks to those providing the information. From loss of agency confidentiality, to compromising personal identification information, releasing data inherently puts data providers at risk. But openness and security don’t have to be mutually exclusive.
In GovLoop’s recent event “Data Done Smart: Your Guide to Data Analytics in Government,” Saxon and the National Center for Health Statistics’ Peter Meyer discussed how both of their organizations are simultaneously opening up key datasets and protecting their data providers from security risks.
Why are government agencies and other public sector organizations hesitant to open up their growing piles of data?
“Data is power,” Saxon explained. With aggregated sets of data, “we’re able to tell a story that you can’t tell by looking at 2 or 3 dozen systems separately,” he said.
“Most of our data providers are not concerned with us leaking their data,” Saxon said. “There’s an assumption that we’re going to protect that data.” However, many are concerned about how their data will be used. Not wanting to be painted in a negative light, data providers are often concerned that people will spin stories in a way that turns their organizational data against them.
To mitigate this fear, Saxon explained that the DoD made decisions about what data they needed, when and how they would share it, and to whom they would grant access. Then they shared specifics about how they would use data provided to them. With this approach, the DoD was able to pool data across Army sources into an environment that allows them to share that key data across their agency while controlling access. The more data they pooled and were able to use wisely, the more data providers began to trust them with their data.
As an employee in an organization that is required to collect and share government data, Meyer asserted that maintaining confidentiality is crucial to both gaining data providers’ trust and securing open datasets. “We have a strong set of confidentiality agreements,” Meyer said. “There has been a zero-tolerance for disclosure in the federal government since we began collecting data.” Simple legal protections like this increase both security and trust among the involved parties.
The government collects terabytes of data every year, and as technology advances, that number will only increase. There are simply not enough people in federal government today to analyze that information in a useful way. That is why, according to Meyer, it is imperative for the government to open up their data to the public. Opening up data allows the government to crowdsource the public for integral analyses.
While the DoD has successfully opened data securely for sharing within their agency, publishing data to the public presents even greater risks. Peter Meyer posed, “How do you protect data while making it available to the public?”
The National Center for Health Statistics’ Research Data Center developed physical, legal and social barriers to secure publicly available data. According to Meyer, four percent of government data is high-value information to researchers, but also high-risk for data providers. This portion of data typically includes individuals’ personally identifiable information, or other sorts of information that put confidentiality at risk.
To protect confidentiality, Meyer’s organization physically extracts any non-essential information to researchers that might make data-providers easy to identify. “We allow access, but we control it,” he said. Saxon added that past security breaches have led the DoD and others to shore up their IT defenses and monitor who has access to government data.
“We want [the public] to have our data,” Meyer said. He contended that for all the physical and legal barriers, the most effective security measures are often social ones. “We really depend on a symbiotic relationship between the people from whom we collect data and researchers,” he said. Simply reminding researchers that they are partners in a collaborative effort often dissuades them from misusing the data.
Meyer concluded that government agencies looking to open their data to the public should not hesitate to do so, but they should implement legal, social and technological protections to secure their data.
Combing through data, extracting what’s important, and implementing various security measures, “This is very labor intensive,” Meyer said. Continually hiring more and more people to handle data is not a sustainable practice. “Big Data is like mining for gold, you’ve got to comb through dirt to get to a nugget,” Saxon said. The government simply does not have enough resources to find those nuggets on their own.
To handle the onslaught of information, agencies must open their data and employ the public for essential analyses. Advanced IT, social pressures and legal measures will ensure that this data remains secure.