Where does your organization stand?
This is the second blog in a four-part series detailing the components necessary for AI success. You can read my earlier post about cultural willingness, which must be prioritized ahead of data and infrastructure readiness (this blog), workforce skilling, and plans for ethics, risk and compliance.
Combining the computational power of artificial intelligence (AI) with the critical thinking ability of humans is the ideal solution for organizations looking to accelerate the discovery of actionable insights from their data assets. Even with the human expert in the loop, to achieve valid results with as little bias as possible, AI relies on large volumes of historical data and sophisticated mathematics to generate insights. Consequently, an organization must achieve a certain level of data and infrastructure readiness before it can successfully execute an AI project.
Data readiness goes beyond engineering of data systems for large volumes of data. After all, every organization has lots of data—volume alone is neither a differentiator nor a value generator. The key to success is to derive lots of insights and value from all that data. The success of an AI project therefore depends on other critical facets of the data that are used to train and inform the algorithms. Those facets include data availability, accessibility, usability, format, quality, provenance, semantics, and security.
For AI to generate the most valuable insights, data must also be enriched. Enrichment may be provided by humans, other data sources (potentially external), and algorithms, including machine learning algorithms that discover patterns in the data and tag the data accordingly. Enriched data sets are labeled, aggregated, validated, complete, accurate, and indexed for ease of discovery and access. Too often, organizations are sold on AI benefits and are eager to jump into predictive, prescriptive, and cognitive analytics before they have put in the foundational investment to ready their data.
In assessing where your organization’s data readiness stands, consider some of the most common challenges:
- Unlabeled, unvalidated, and incomplete (biased) data that can’t be used for training AI solutions;
- Siloed, disparate data sources that require time-consuming and manual cleaning, cross-indexing, integration, and organizing for analysis; and
- Lack of technological infrastructure to support the movement and computation of the huge amounts of data that are required for training AI solutions.
I think about building data readiness in stages, or layers, like Monica Rogati’s Data Science Hierarchy of Needs. Mirroring Maslow’s Hierarchy of Needs, the basics are at the bottom and grow more complex (and value-packed) as the applications rise to the top.
Surprisingly, a good starting point for assessing your data readiness is not the data. Instead, it’s identifying the questions that need to be answered. For example, an organization’s HR data can tell us what experience and training a team has, but leadership will want to know: does the team have the right experience and the right training to meet our goals? That 30,000-foot thinking can bring clarity to prioritizing data readiness. If nothing else, it helps to focus your attention on the forest, and not on individual trees.
Hopefully, I have demonstrated that data readiness is less about the data and more about readiness. Readiness spans multiple layers of operational maturity including:
- Standardized methods for labeling, validating, cleaning, and organizing (indexing) data across an enterprise;
- Data strategy that establishes guidance for effective data management and data usage;
- Data governance that spans compliance, risk, and regulation related to data (including privacy, security, and access controls);
- Data democratization policies that specify access rights, ‘right to see’ authorizations, ethical principles, and acceptable applications for data usage across the organization;
- Open data platform that aggregates data and enables automated data ingest, processing, storage, and workflow orchestration;
- Organizational assessment of technological infrastructure needs; and
- Investment in the infrastructure (e.g., cloud, GPUs) to support AI solutions.
A recent survey by Forbes showed that only 12 percent of responding organizations have an enterprise-wide data strategy. Even more eye-opening is that 80 percent have less than half of their data available across all teams. Clearly, the bottom line from this survey is that there is a significant lack of data readiness across a majority of organizations. So, don’t let FOMO (Fear Of Missing Out) drive quick, rash decisions in your AI strategy—deliberate, informed, goal-oriented planning is the path to success.
Now is the time for those who hope to join the AI ranks in the next decade to take a data readiness self-assessment. Spending the time up front to organize, prioritize, and execute against data and infrastructure needs will position organizations for long-term success. In the end, the organization that wins with AI is not the one that has the most data. The organization that wins with AI is the one that derives the most effective actionable insights (AI) and the greatest value from their data assets.
Keep reading about data readiness with Assessing your Data Readiness for Machine Learning.
Dr. Kirk Borne is a GovLoop Featured Contributor. He is the Principal Data Scientist and an Executive Advisor at management consulting firm Booz Allen Hamilton since 2015. In those roles, he focuses on applications of data science, data management, machine learning, and AI across a variety of disciplines. You can read his posts here.