This blog post is an excerpt from GovLoop’s recent guide Embracing Data Analytics: Common Challenges & How to Overcome Them. Download the full guide here.
If getting started with data analytics weren’t hard enough, now you have the added task of figuring out which tools you need and whether to build or buy them.
Based on feedback from our data analytics survey, several agencies are developing capabilities in-house. Others are using cloud-based, open source and custom-built tools. So what’s the best option for your agency?
Jeff Chen, Chief Data Scientist at the Commerce Department, encouraged agencies to consider these questions before settling on a specific tool:
- Who will be using the tool?
- What specific needs do they have?
- What are the implementation requirements for the tool, and can your agency support those requirements?
- What skills are required to use the tool?
For more on how to choose the right tools and techniques, we spoke with Robin Thottungal, Chief Data Scientist and Director of Analytics at the Environmental Protection Agency, and Dawn Brown, a Procurement Analyst at EPA. They shared these words of wisdom:
Focus on outcomes.
Thottungal and his team are less focused on buying specific tools from vendors than they are on developing a minimum viable product, or MVP. Think of it as the version of a new product that allows the team to identify kinks and get valuable feedback from customers.
The focus for EPA is on identifying the problems that need solving and then developing an MVP.
“From there, we start thinking about what appropriate tools and techniques [there are] to solve the problem,” Thottungal said. “We are not saying, ‘Let’s go and buy the tool from company X and try to fit that back to the entire enterprise needs.’”
At a disparate agency like EPA, one tool likely won’t be the right fit to fix the varying needs across the department. “You need to have a visualization tool, [and] you need to have a tool that is capable of doing compute” and tools for storing data, Thottungal said. “What we are trying to identify is: How can we enable our internal staff to be proficient in using some of these tools?”
When adopting open source technologies, keep hidden costs in mind.
“If we can identify an enterprise open source tool, we will go with that because we know that will reduce the cost of licensing,” Thottungal said, adding that EPA is using several open source tools as part of its big data ecosystem.
For agencies that are considering open source, Thottungal noted that although there may not be upfront costs, there are hidden costs to consider.
“The hidden cost is basically identifying the appropriate talent within the organization who can actually maintain the toolset,” he said. “The question is, how comfortable is your team maintaining that infrastructure and the whole ecosystem?”
In other words, keep in mind that it takes manpower and resources to maintain the open source tools after acquiring them.
Let the mission dictate the tools.
Predicting harmful algal blooms across the U.S. is serious business at EPA because of the damaging effects that can sicken or kill people and animals, create dead zones in the water and raise treatment costs for drinking water.
“We came to realize that this is a problem that requires a tremendous volume of data-crunching [and] a very powerful computing infrastructure,” Thottungal said. “We realized that this is an opportunity for my team to work with [others in the department] to identify an appropriate solution.”
The team needed the ability to store and use large-volume datasets, conduct high-performance computing and perform analysis on near real-time datasets that came from sensors deployed across various bodies of water. The sensors detect water temperature and nitrogen and phosphorus contents that wash into the water. These requirements dictated what tools EPA used.
Another undertaking at EPA involved the development of spending visualization tools to better track operational costs and contracting data. Initially, the data was presented in spreadsheets, which evolved to static images, Brown said. The end result is an interactive dashboard. “We did this in phases over the course of two years,” she said.
When it comes to identifying projects that are ripe for analytics, the agency relies on a community of practice that includes about 200 people. At a minimum, they meet biweekly to discuss analytics work and opportunities across the department, and suggestions for analytics projects come from all levels of the agency.