Sometimes, agencies don’t know how their data has moved and evolved through the enterprise, and they cannot verify the data’s accuracy, quality and reliability. This lack of insight is more than an annoyance: It makes decision-making and data governance far more challenging.
How It Happens:
Traditional challenges (e.g., legacy technology, manual processes, inconsistent standards, data silos, regulatory requirements and inadequate training) make it tough to confirm data integrity. The sheer volume, variety and locations of internal and external data sources add to the complexity. Think of the public health ecosystem, for example. A single dataset may combine hospital records, lab reports, state and federal registries, provider details, and other information. That’s a lot of fragmentation to overcome.
Solution:
You need good data lineage to chart a dataset’s life cycle — consider it data’s version of a family tree. Certain tactics will help you make those connections. They include:
- Implementing automated lineage tools that capture and visualize real-time data flows, in lieu of manual efforts that often reinforce data silos
- Standardizing metadata management by creating a central repository of data about your data — such as, its author, file size, and dates accessed and modified — using consistent naming conventions and formatting
- Integrating lineage into your data governance framework by establishing data stewards who manage the lineage records and by weaving lineage tracking into your compliance and audit processes, among other efforts
- Training staff and building awareness about why lineage matters for compliance, transparency and decision-making
- Connecting disparate systems using application programming interfaces that allow for seamless communication and data exchange
Whereas data lineage tracks where information originates, winds up, and what happens in between, data traceability audits and validates specific data points, which makes it vital for regulatory compliance. By tracing data, for example, you can find cases of unauthorized access that threaten data integrity and identify when someone accidentally changed a sensitive customer record when processing data in large batches. The same tactics for data lineage — such as metadata management — also enhance traceability.

A version of this article appeared in our guide Better Data Strategy for the AI Age. Download the guide for more insights into how agencies can adopt more coherent, effective ways of managing their data.



Leave a Reply
You must be logged in to post a comment.