Everyone knows, or should, that government has an obligation to protect sensitive data, including details such as Social Security numbers and bank accounts. But sensitive information sometimes is transmitted or exposed, usually by mistake, anyway — going to unauthorized or unintended recipients either within or outside an agency. Such data leaks affect all levels of government and can lead to costly data breaches.
How It Happens:
Human error is responsible for more than 90% of data leaks. An employee may mistakenly upload a confidential file to a public server, respond to a phishing email, leave their laptop in a public location or abandon sensitive documents on a printer. In addition, weak infrastructure — including improperly configured or updated networks and poor security controls — can be the culprit. And third-party vendors, applications and system errors pose risks, too.
Solution:
Several approaches can help prevent and catch data leaks. Here are a few:
- Validate your cloud storage security. Ensure that your cloud storage is configured correctly when you first use it and check the settings periodically.
- Automate your process controls. Compared to humans, automation will more efficiently, consistently and accurately enforce the policies protecting your cloud storage.
- Monitor third-party risk. Outside entities may need to access your systems, so vigilantly look for data leak indicators, such as unusual network or user activity.
- Adopt data loss prevention tools. These help agencies monitor and control sensitive data moving across networks, endpoints and cloud ecosystems.
- Encrypt critical information. Encryption protects data both in transit and at rest and often is a compliance requirement.
- Educate employees. Train them to identify and protect sensitive information.
- Classify and audit sensitive data. Categorize your data by security level to identify and protect the most valuable assets. Audit regularly, looking for permission and access issues.
- Implement and enforce role-based access control. Give employees access only to the data they need for their job functions.

⇒ Data Leakage in Machine Learning
AI has highlighted the need for data security in general, but data leakage has a unique meaning in the machine learning world. It happens when an ML model is trained using data that should be excluded from the process, such as test data. Because the model has learned from patterns it wasn’t supposed to see, the model generates misleading, unreliable outputs when you give it real-world details. If your model’s performance seems too idealistic, data leakage may be the cause.
The best practice for preventing ML data leakage: Keep your training and test data separate.
A version of this article appeared in our guide Better Data Strategy for the AI Age. Download the guide for more insights into how agencies can adopt more coherent, effective ways of managing their data.



Leave a Reply
You must be logged in to post a comment.