At our recent GovLoop event on The Big Data Playbook for Government , we heard from three experts about what it takes to build an effective big data team. Our panelists were:
- Professor Kirk Borne – Data Scientist and Professor of Astrophysics and Computational Science, George Mason University
- Lori Walsh – Chief, Center for Risk and Quantitative Analytics, Division of Enforcement, U.S. Securities and Exchange Commission
- Ann Amrhein – Deputy Associate Commissioner, Office of Earnings, Enumeration & Administrative Systems (OEEAS), Social Security Administration
Borne, Walsh, and Amrhein have divergent experiences with big data in government. Yet all three agreed that 4 basic steps are necessary to ensure your team is ready to take on the challenges of government big data.
1. Ensure you have the right infrastructure.
Data analytics requires an advanced IT infrastructure that can provide the storage and power to query large amounts of information. So, before you assign staff to your project, it’s imperative to make sure you have the right systems in place to support your goals.
Amrhein explained how SSA’s audit draft system was overtaxing their previous architecture because it produced so much data. “We were trying to query that data and make use of that data but it was becoming increasingly problematic,” she said. “It would take up to ten hours to run a query, making the data more difficult to handle and less useful.” Her team went in search of a new architecture that would better accommodate their needed. Once they found Hadoop, she said, “It opened up a whole new world of analytics.”
There is no one-size-fits-all architecture for big data, however. Walsh explained that to determine the best solution set, data managers shoud work with the IT office “to get the right tools, and then get your information in a useable form, in an accessible place.”
2. Secure a project management team.
Once your IT environment is prepared to handle big data, project managers should be assigned. Walsh said this team is necessary to provide discipline to the people and projects that want to use analytics for problem-solving. This role goes beyond managing timelines and budgets. Most importantly, project managers are in charge of determining what problem they’ll solve with big data. Otherwise, you risk investing significant resources into projects that have no clear goal.
“You have to ask, ‘What are the questions we want to answer with this data?'” Walsh said. “This is critical. Just saying data anlytics –that means nothing. You have to have a specific question.”
Amrhein agreed. She also said, “Once you determine the analytics [that you want to achieve], that will help you figure out the people you dont have.” In other words, project managers are necessary to determine both the problem to be solved and the team who will do the investigating.
3. Gain leadership buy-in.
Nevertheless, procuring those people may not be easy. As with any other project in government, acquiring the necessary resources, support, and staff for a big data initiative is dependent upon gaining buy-in from management. Yet big data can be an especially challenging project, given the time it takes to produce measurable results and the fact that many non-technical employees are not familiar with the big data process or its benefits.
Walsh explained how to overcome the former issue by resetting timeline expectations with leadership. She said that almost 2 years into setting up her group, they are just starting to make real progress. Yet, 3 months into her project she had a director knocking on her door asking what they had accomplished. In order to fulfill her mission, she had to create a clear timeline for her managers. “You have to tell them,’It’s not going to be next week, or next month. It might not even be next year,'” Walsh said.
But while you work on that extended timeline of creating a complex analysis program, Professor Borne recommended targeting the “low-hanging fruit” when you start your big data journey. “So much attention given to the complexity of big data and the systems that surround it,” he said. “But sometimes the fastest home run you can hit is the simplest thing.” By achieving easy-wins in the short term, big data teams can highlight the potential of their analysis and ensure ongoing support for long-term goals.
As your program does grow in sophistication, the panelists also advised making the program accessible to as many people as possible in order to secure greater buy-in across the organization. At SEC, Walsh explained, big data projects are seperated into three tiers. The first tier takes problems and solves them within the data team, the middle tier provides tools and data to sophisticated users so they can independently work on projects, and the last tier provides templated solutions that require users to put in minimal data. This range of options allows any user in the organization to leverage big data, regardless of their skillset or data science knowledge. “It empowers staff and makes them feel part of the process,” said Walsh.
4. Hire people with the necessary data skills.
Finally, once you have secured IT solutions, projects leaders, and management buy-in, it’s time to hire a team of data scientists who can tackle your projects. Yet Professor Borne said that while technical skills are important, they are only truly valuable when combined with an understanding of data and problem-solving. “I always tell my students to understand data as a literacy,” he said. “Literacy is knowing how to use data, when to use it, what types there are, and what types of questions it can answer.”
This literacy transcends specific programming skills. “What’s important is that they aren’t one type of thing like a java programmer or a Hadoop programmer. Instead, my students need to be able to define a problem and then apply solution to that problem,” Professor Borne explained.
In conclusion, our panelists agreed. Big data projects are much more than applying new technology to sort information. It’s about solving problems, and building a team that can support that ambition.
See more coverage of our big data event at this link.