The following is an interview with Mark A. Johnson, Director, Public Sector Big Data Program at Oracle. To learn more about how your agency can excel with big data, be sure to check out our guide: The Big Data Playbook for Government.
Although big data has the potential to radically redefine government operations, organizations still face significant obstacles such as training, integrating disparate data sources, and keeping costs low. As is the case with any kind of IT initiative, to succeed in these initiatives, organizations should look to integrate big data into already existing standards and procedures used at the agency.
“Organizations really need to simplify the process and use the skillsets that people have, advised Mark A. Johnson, Director of the Public Sector Big Data Program at Oracle. “Training and hiring people is difficult and expensive, but adopting interfaces that you have and leveraging existing skillsets is relatively inexpensive. “There may be some upfront costs at times and new IT tools, but in the end, the tools are going to allow data to be leveraged much faster. The idea is that you don’t want to train staff on new IT – you want them to use existing tools to leverage big data.”
Users should have easy access to data and information anywhere in their agency. But the challenge then becomes knowing where to store data, since different data should be stored in different ways. Additionally, to improve data utility, agencies should strive to minimize data movement between different data stores and locations.
“Relational data still belongs in the relational store, but unstructured data may belong in a Hadoop Distributed File System (HDFS), or in a NoSQL database,” said Johnson. However, no matter where the data is stored, analytics should work across all data, and provide an analyst access to any information that is needed.
“Different analysts may have very different needs as to the kind of analysis they want to conduct,” explained Johnson. “So agencies need to be agnostic at the data store level of who’s getting data and how they want to receive access. People want to analyze data, check it out, look at it and use it in different ways. And we should have the ability to transparently and securely use different stores with any kind of analytics.”
“Sometimes you can’t move data or aggregate all the data that may be useful to your particular agency,” Johnson added. “There are several examples where an agency said ‘I’d like this from this other agency, but for a variety of reasons, sometimes laws, privacy restrictions like HIPAA, they can’t give it to us.’ Many agencies would like to have ways of reaching across different datasets and federating queries.” By developing a properly designed Big Data analytics architecture, agencies can securely provide an analyst access to the right information, at the right time.
Another challenge agencies face? Keeping costs low by leveraging open source products to power big data programs, but avoiding expensive costs related to integration and consulting. The key for agencies is to build a big data infrastructure that is balanced between open source and proprietary solutions to simplify setup and use.
“There’s this idea that open source can do everything,” said Johnson. “But there are better ways to set up, maintain and secure Big Data architectures. The great thing about open source is you can use it to drive down your costs, but you should only use it where it makes sense and determining where that is what we help our government partners with.”
One example of where Big Data can make government more effective for citizens comes from a project from the National Cancer Institute. They were looking to match 17,000 genes known to be related to certain cancers with the nearly 20 million research articles in the PubMed medical library. This data would then be used to create customized treatment plans, based on an individual’s genome across a population of 900 Million citizens.
“Because there’s 20 million text abstracts, you’re searching basically unstructured data,” explained Johnson. “You do have some structured datasets in the gene types, the known cancer associations, but matching all that different data up was something they simply couldn’t solve, until we brought in this idea of using big data. And a key part of it was open source.”
Johnson said a critical component was making sure they were using the right open source tools with other product to make the process work. “Rather than try to string together a whole cluster and download software and install it, we could bring in our tools that made it very quick to set up. We started the process on a Friday, and on Monday morning the National Cancer Institute had their answer,” said Johnson.
The National Cancer Institute is one of many examples of how Oracle is helping agencies capitalize on their data. With Oracle, public sector leaders can obtain a complete, open and secure suite of big data technologies, servers, and storage solutions engineered to work together, optimizing every aspect of government operations.
For more information, be sure to check out our guide: The Big Data Playbook for Government.