‘Data Mining’ – Do’s and Don’ts


We are inundated with tales of analysts who strike ‘gold’ when they work with the vast amount of information at their disposal.
Well, at the risk of bursting someone’s bubble, I want to share some cautionary tales in the hopes that you learn to avoid the various pitfalls of buying into what is tantamount to ‘fool’s gold’.

Lesson #1: Don’t make assumptions. Closely examine your data with an open mind.
Years ago when I was the ‘data guru’ for job training programs, I remember a meeting where the topic revolved around ways to increase a youth program’s positive outcomes. A touchdown would be gained if the teen either achieved their high school diploma/GED or a paying job. Someone noticed that this program had a fair number of pregnant girls and they concluded that success depended upon enrolling less girls. The reasoning that others agreed to was that pregnant girls would want to stay home with the child and therefore would not actively seek employment or a diploma. The problem was that a closer look at the data showed the opposite was true: pregnant girls had a higher entered employment rate and GED rate when you looked at outcomes before or shortly after giving birth, than the boys who were enrolled in the program. My presumption that bore out, was that these girls often did not have support from the child’s father and even if they still lived at home, they wanted to improve their life and that of their child’s. So, they were highly motivated to succeed. A closer look at the data showed the girls did not have positive outcomes if only judged by what they did during the initial month after giving birth. Logical, though, right?!

Lesson #2 When comparing studies’ data outcomes, be aware of differences in how questions were worded. Don’t assume that phrases had common definitions.
Recently, the Washington Post covered what appeared to be discrepancies between studies which purported to reveal the rate of women who were raped or faced an attempted rape.

Each study, on the surface, appeared to be based on sound survey principles. But a closer look at the wording used in each survey’s questions showed why the outcome data was so different. One study included consensual sex that resulted from being told lies or being misled. (For example, being told that marriage was in their future so that sex was not simply a result of horniness.) The other study asked about the more ‘traditional’ definitions of attempted/rape.

Lesson #3: Don’t assume that data was entered correctly. Mistakes happen! That’s why they put erasers on pencils.
Another time, I conducted a data integrity audit of sorts as required by a Federal agency. I discovered numerous data entry errors when comparing clients’ paper records with case management software’s data. Sufficient errors that the program’s reported outcomes were in question.

Lesson #4 Don’t assume that your data sources ‘played by the rules’. They might have felt that their means justified the end results (as in continued program funding).
I occasionally discovered instances where an entity did not include negative data in their dataset. Discovering this can involve time and ‘detective’ skills. But, is well worth the effort. If only to verify the accuracy and completeness of your data.

Lesson #5 Don’t be afraid to seek out more data in order to make your case.
Another time when I was reporting job training program outcomes and hoping to justify outcomes that were ‘okay’, but not as great as hoped for, I examined the participants’ addresses, car ownership and public transportation. I was able to make a case that the lack of public transportation coupled with the number of clients who lived in rural areas or whose age made walking great distances to a bus, highly unlikely, negatively impacted the likelihood of many participants obtaining gainful employment.

Bottom-line: Data mining can be a powerful, useful tool in your toolbox. Just use it wisely, being aware of possible pitfalls and that darned ‘fool’s gold’!

‘Nuff said.

Russell A. Irving is part of the GovLoop Featured Blogger program, where we feature blog posts by government voices from all across the country (and world!). To see more Featured Blogger posts, click here.

Leave a Comment

Leave a comment

Leave a Reply