We have all run into the office cliché similar to the saying “if all you use is a hammer everything looks like a nail.” Another way of saying this is that if all you see are nails, anything looks like a hammer. In both cases, the focus of misunderstanding is based on overlooking the particulars of nail and hammers.
The goal of doing something with a hammer is to do something with nails. Understanding a context of hammers and nails is critical. For example:
- A rubber mallet looks like a hammer but only bounces off a nail; it neither drives a nail into anything nor straightens a crooked nail.
- The tool needed to straighten out a nail may not be a hammer, but needle-nose pliers. By using a hammer, one is more likely to pound fingers by trying to hold the nail.
- What is needed to pull a nail out of a board may not be a hammer but a “nail bar.”
- A handle of a screwdriver can be used like a hammer but is not very effective.
- The tool used to stick a nail into bread is not a hammer.
A “tool” maybe an “app,” software, web application, or system that is thought to fill a gap in capabilities or fix a problem. When problems arise, sometimes people respond as if it were best to vaguely identify a tool first and worry about its capability to address problems, especially data problems, later. This seems to be a response when the cost of buying a tool is reasonably known, but the benefit associated with fixing the problem is not.
Data is not a ‘tool,’ but it falls victim to almost tool. The thinking goes that for any nail, i.e. “data,” one tool is roughly as good as another. The tool is the thing. Data is just an afterthought.
The “we’ll fix it later” approach kicks the “data” can down the road for so long that the costs and benefits of this decision making pattern of buying tools and not solving data problems instead are unknowable. For example, if it costs $500K to buy a tool and $100K in maintenance costs per year, that expense may go on year after year unless the data is cleaned and quality control procedures are instituted. Or it might cost $1 million in labor for one year, and then $50K annually to improve the data quality. The long term costs and benefits of a system are the measure rather than the “total cost of ownership” of an IT system’s tool. A recently heard a data architect from a major corporation say repeatedly that even a typo in code can result in big problems. There is no doubt too that a “tool” can discover some types of data errors quickly. However, corrections to those errors may be wrongly hard-coded, but not changed in source systems, which make errors more difficult to correct. I was recently asked to estimate what investment was needed to make in “tools” for BI, and the person asking seemed astonished that I said that we had the necessary tools but needed to organize the data.
One contradiction to the “tool” vs “data” problem is that, given any past poor judgement to buy a tool, an organization may be stuck with inadequate technology, which the organization cannot replace and so is stuck with trying to manage bad data in a difficult way. In that case, every problem is compounded: people struggle with inadequate technology to inadequately manage data while investments in both decrease. Another contradiction is that often in the public sector cost-savings are calculated based on reduced expenditures or reuse of technology (thought of as a “tool” in general), while the benefits of better data (or reduced fraud) are not calculated or are not calculable. Furthermore, deciding not to buy a tool is not considered savings or cost reduction. Meanwhile, for “data people,” data is so tangible that lack of discipline to control it is seen as costly.
In conclusion, reaching for a “tool” is foolish when sound data is enough. As is assuming that any collection of tools will solve problems. Even “data science” or “predictive analytics” can be labeled as tools when consideration is not given to the specific purpose. There are many software combinations or suites to address data creation and management. A person can discover sources of mistaken data by reading the SQL code, schema, or ETL code. However, nothing replaces people’s inquiry into the meaning of a concept that identifies concepts and their measures in the first place.
This drives me crazy. Why isn’t there a tool to cure that?