Data is the engine that drives progress throughout government. When data is up to date and accessible, it can be harnessed to improve responsiveness, insights and decision-making. But getting to the point where all data is discoverable and accessible to everyone who needs it can be difficult, often due to the way the data landscape has grown over the years.
For many agencies, the ultimate goal is to better serve citizens and government workers by using the most relevant data to provide the best possible response “in the moment.” This requires being able to access a variety of data sets from a wide range of environments at all points in time. That includes transactional and biographic data from legacy databases, streaming data from IoT devices and other sources, and web click data, to name a few. The result is citizens who are more satisfied with government and decision-makers who have more insights and who are better able to collaborate and share data for greater positive government impact.
One of the most common issues agencies face is dealing with the sheer number of data repositories that have developed over time. Data in these standalone repositories, disconnected from other data sources and applications, is often difficult or impossible to access.
When it is possible to access data from some of these silos, it can take time and money to negotiate access rights and then build the right connectors. And in cases where data is housed in older, proprietary systems, it can be difficult to access it without negatively impacting the systems the data resides in themselves.
The standard data lake approach, which essentially combines all data into one resource accessible by all parties, doesn’t go far enough. Data lakes are passive by nature and better suited for finding data or specific records. Instead, agencies need to ensure that all data is discoverable and accessible to all parties in the moment it is needed. Quite often, citizens and government workers need a blend of historical and real-time data if they are to respond in a manner optimized for the moment.
The Solution: A Universal Data Pipeline
An alternative approach is what’s known as event streaming. Unlike data lakes, which essentially are snapshots of data sets at a particular point in time, event streams combine data coming from various resources into a single stream — making it possible to process, store, analyze and act on data as it’s generated in real time. This approach makes it easier to combine both historical and real-time data from multiple sources into any application, providing an easier path to analysis and productivity.
“The story over time tells you something that each snapshot itself won’t tell you,” said Will LaForest, Public Sector CTO at Confluent, which offers an event streaming platform. “The idea is simple: Every time there is a change to a source database, that change is distributed to everyone who cares. In other words, it is actively pushing data to the people who need to solve problems instead of treating data as a passive asset.”
This approach decouples the data producers from data consumers. Historically, users or developers who needed a specific data set would have to locate the source system and negotiate how to get the data. Because the data is now independent of its original owners, it is much easier to find and access.
Once a data source is connected to an event-streaming platform, that data is always available and up to date. From that point on, any application can access it via the eventstreaming platform. Now easier to find and access, the data can be used to greater advantage. For example, agencies could build a 360-degree contextualized picture of a single constituent applying for citizenship or health care benefits, guiding and responding based on who that person is and what they need. An event-streaming platform also can improve both interand intra-agency collaboration. “If I’m the custodian of a data set, I simply publish it, and I don’t have to worry about who wants it,” said Jason Schick, Confluent’s General Manager for the U.S. Public Sector. “You remove the institutional friction of sharing data: negotiating with other organizations, building interface connections and worrying about compatibility.”
This article is an excerpt from GovLoop’s recent report, “Connecting the Dots: Getting Maximum Value From Data.” Download the full report here.