Data Pipelines: A Vertically Integrated Digital Supply Chain

In the age of Artificial Intelligence (AI), the vast amount of data generated by the modern supply chain is becoming its most valuable and underutilized asset. Data are the fuel for modern AI systems, which is why AI has the potential to transform every aspect of the supply chain: from back-office processes, forecasting, and inventory planning, to procurement, warehouse management, and distribution.

Indeed, a simple Google search will reveal that there has been no shortage of articles written on the many potential use cases for AI in supply chain management (SCM). However, bringing these use cases to life in a scalable, sustainable way requires a number of prerequisites. Here, we focus on one such “core enabler” needed to bring cutting edge AI use cases to fruition: a robust data pipeline.

In a previous post, I discussed the criticality of accurately labeled data for training machine learning-driven AI systems. However, it is easy to overlook the fact that it must be production data in order for the trained model to be valuable in a production setting. This means making production data available to a training/development environment, training a model, moving that model into a deployment environment, and then capturing its output in way that facilitates model improvement, thereby creating a virtuous cycle.

Put simply, creating valuable AI requires more than just generating clean, annotated production data – it requires the ability to efficiently move that data through multiple environments. This is a departure from the traditional software development practice of using notional data in a dev > test > prod waterfall approach.

We can use a familiar framework to explore this new development paradigm: an efficient data pipeline is really just a vertically-integrated digital supply chain. Just as physical supply chains are designed to support the repeatable development and delivery of a product, a digital supply chain should be designed to support the repeatable development and delivery of AI models to drive business outcomes.

The first step in the digital supply chain is gathering raw production data, which is analogous to the raw material at the beginning of a physical supply chain. Typically, raw production data must be prepped (cleansed, transformed, etc.) and annotated prior to being fed into a model training algorithm. To avoid unnecessary risks to day-to-day business operation, these preliminary activities must occur outside of the production environment. Thus, the relevant production data must be transferred to a development sandbox.

Next, the quantity of training data and/or complexity of the model being trained may necessitate a training environment with high computational horsepower. In these cases, moving prepped data to a cloud environment can be an advantage for model training, as the environment can be turned on only when needed. Even if model training is conducted on premises, the iterative nature of new AI model development often warrants a dedicated environment in order to preserve sandbox resources.

Finally, trained models themselves have a relatively small footprint – they don’t require increased memory or compute power and therefore don’t need to run in the training environment. Often, trained models can be deployed to end users on an organization’s existing platforms. In a departure from our analogy, however, the end of digital supply chain is not one-way. The data pipeline must create a feedback loop such that existing AI outputs can be captured and processed for continuous model improvement.

Just as a physical supply chain is about turning raw material into finished product, the goal of a digital supply chain for AI is to turn raw data into business insight. While this data pipeline is a prerequisite for any business to implement sustainable AI development, the large amount of historical data and the familiar framework of vertical integration make supply chain organizations well-positioned to take advantage of an efficient data pipeline for delivering value through AI.

Co-author: C. Alan Wright, PhD, Accenture Federal Services Defense AI R&D Lead

Dominic Delmolino is a GovLoop Featured Contributor. He is the Chief Technology Officer at Accenture Federal Services and leads the development of Accenture federal’s technology strategy. He has been instrumental in establishing Accenture’s federal activities in the open source space and has played a key role in the business by fostering and facilitating federal communities of practice for cloud, DevOps, artificial intelligence and blockchain. You can read his posts here.

Leave a Comment

Leave a comment

Leave a Reply Cancel reply

Recent Articles on GovLoop

Related Content

How AI Is Transforming the Government Workspace

AI for the Public Good

How to Deliver on the Promise of AI

Leave a Comment

Leave a comment

Leave a Reply Cancel reply

Recent Articles on GovLoop