The Data Science Skill Shortage in the Public Sector


Big data has changed the corporate world in a pervasive and global way.  We’re very familiar with what has occurred in social media, with the explosion in the use of search engines, Facebook and Twitter.  That data has been subsequently used for the analysis of health (flu trends), behavior (emotional contagion), and marketing (just look at your favorite search page) purposes.  Most of us are not that familiar with what happens behind the scenes to process the petabytes of data collected through sensors, cell phones, images, search engines and financial transactions, but it’s safe to say that the analysis of these data make our current information-based economy run.  The need to analyze these data has created the role of the data scientist and the transition of mathematicians, physicists, engineers, economists, and other quantitatively skilled individuals into the data science with computer scientists.

While there is some attention to better data use in the state and local public sector, it’s probably safe to say that there really isn’t a good parallel to either the collection or use of data like that described above when it comes to state and local government—particularly the efforts to improve the well-being of the population—better educated children, safer neighborhoods, better long-term care for the disabled and elderly, cleaner air and water. However, there has been an increasing amount of analysis using the administrative data that government generates through the use of information systems that help manage service provision, compile student data, investigate and prosecute crime and child maltreatment, and track and collect income taxes.  These datasets are relatively small, but contain data that require subject matter expertise and strong computational skills.

Data use is not transforming government the way it did big business because government can’t hire enough qualified individuals with the substantive and technical expertise to produce the type of information needed to do what we all want them to do—improve the well-being of all citizens.  Big business and start-ups all over the world are hiring anyone with even the most basic programming and analytic skills at salaries and working conditions that government can’t match.  Then government has to hire those companies to provide the services that they can’t—namely the collection and analysis of their own data to drive decision-making.  Government, at all levels, is seeing its fiscal resources dwindle and can’t purchase all that it needs to be smarter about making progress on the things that matter most to us—our children’s education, our health, and our safety.

It’s hard to imagine a solution for this problem as the demand for data professionals increases in all sectors.  Is open-source software that government analysts can modify for their own purposes at beginning?  Maybe and here’s one example that facilitates the production of open data ( actually produced by the Chicago city government!  However, for another city to use it also requires that there is a data professional that can implement the open source code.

More ideas on how to address this severe skill shortage in the public sector next week!

Robert Goerge is part of the GovLoop Featured Blogger program, where we feature blog posts by government voices from all across the country (and world!). To see more Featured Blogger posts, click here.

Leave a Comment


Leave a Reply

Mark Hammer

I think the bottleneck is that the folks with the skills to work productively and creatively with the data are simply not the same folks who enter public admin programs and think about organizational issues. The data itself is meaningless, and requires folks who understand what it potentially holds with respect to addressing policy issues in order to *acquire* meaning.
The schism between the quantitative/data-management types and the public policy types is quite wide. What is needed to bridge it is appropriate training for the public admin folks in what data has to offer them, and training for the data-management folks in how to start thinking about data as a basis for public policy.
When I was a grad student in psychology, we received training in how to do hardware interfacing and machine-language programming for real-time control and data-acquisition of laboratory experiments. The instructor told us at the outset that he did not expect us to build or program anything. What he wanted was for us to understand the equipment, and realities, well enough that we could then turn around and tell the techs exactly what we needed, what we needed it to do and why, and *they* could build/program it right, the first time. We need to find a way to create that sort of informed link between the policy folks and the data folks.