Agencies wanting to become more data-centric might look to the Defense Department (DoD) for ideas.
While the federal government has its own initiative in the Federal Data Strategy and many states also have developed plans and made progress, DoD is further along than many. It’s putting words into action, using the guidance in its recently published DoD Data Strategy. The document outlines the department’s essential capabilities, focus areas, goals, guiding principles and vision necessary to transform DoD into a data-centric enterprise.
DoD is committed to managing its data as a critical part of its overall mission. By not treating it as a separate commodity, the department expects to make faster, better-informed decisions. A recent survey from MarkLogic and GovLoop found that most federal and state agencies have similar goals, with nearly all agreeing that they must use data in a way that brings both immediate and lasting advantage to their respective agencies’ missions (see Figure 1).
In reworking the department’s data priorities, the DoD Data Strategy emphasizes the value of collective data stewardship, which assigns data stewards, custodians or even Chief Data Officers (CDOs) to be accountable for data throughout its life cycle. Data stewards are responsible for overseeing datasets, and they manage policy related to their datasets, what systems have access to the data, and how the data is tracked and accounted for.
This is a growing but still somewhat untapped area for other government agencies; although most considered it a priority, about 25% aren’t focused on it today (see Figure 2). But there are exceptions: More federal than local agencies have CDOs, along with the occasional state agency.
Putting Data First
Over the years, agencies have tried to put data first, with varying degrees of success. For example, many have invested in technology such as data lakes to centralize data and improve access. Although data lakes have some benefits, the data in them is often raw, stored in its native format, uncategorized or ranked by prioritization. This creates significant challenges when it comes time to extract value from the data.
One way to become more data-driven is by adopting modern approaches and technology for integrating and managing data. A platform that can ingest data from any source, master and enrich the data, and index it for query and search is a good start. That system can also store metadata — data about the data — alongside the data itself.
“With the metadata stored as a first-class citizen with the data itself, you’ll have a lot of important information — where it is, where it came from, who has touched it, etc. — to give you the context you need,” explained Kim Kok, Vice President of Sales for Public Sector at MarkLogic.
Data context and flexibility are critical. Different users or departments might need to look at data in different ways, and those needs will evolve. For example, users might need to incorporate data from back-office and mission functions, which they historically have treated very differently. This approach also gives credentialed users the flexibility to change data without shutting down or redoing an entire schema. Credentialed users can change the data in place, the type of metadata being collected on the data or even the data policies.
Case in Point
With flexibility and effectiveness in mind, the Centers for Medicare and Medicaid (CMS) chose this approach for its HealthCare.gov site. CMS, a part of the Health and Human Services Department, understood that it needed to put data front and center to build a secure, effective technology platform to help enroll millions of Americans in new health care plans.
The data was voluminous, complex and included multiple data sources, such as insurance companies, Internal Revenue Service records and state-based legacy systems. The system had to be accurate, fast, scalable and secure. The project, which is the largest personal data integration project in the government’s history, met its goals by using an enterprise data hub. The data hub ingests data as-is, while also accommodating any changes or additions to data as they are made, in addition to changes in policies or regulations. Today, it supports 160,000 concurrent users with 99.9% availability.
“Data is very complicated today, and it definitely helps to have a data hub where all data, along with contextual and historical data, all lives alongside each other,” Kok said. “There is no better way to get a 360-degree view of your data and know that it’s secure, available and usable.”