Testing AI Calls for New Tactics

Artificial intelligence (AI) brings exciting new capabilities to government IT. But working with AI, especially generative AI, means fundamental changes in the way software is developed and tested.

In October 2023, the White House issued its Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. The EO calls for prioritization of safety and security and consumer protection against fraud and discrimination.

A New Approach to Testing and Validation

Recent advances have made building AI-driven applications deceptively fast. “A chatbot that would have taken years to build can now be built in a matter of weeks,” said David Colwell, Vice President of AI and Machine Learning with Tricentis, a provider of automated software testing solutions. But that upends the traditional wisdom that it should take about as long to test new software as it took to develop. “It’s best to forget that spurious correlation ever existed,” he said.

The EO emphasizes invalidation in software testing — that is, not only checking that it meets requirements, but also assessing risks.

Organizations have been comfortable releasing software without allowing time and space for developers to explore ways that it could misbehave, according to Colwell. That was partly because traditional algorithms were confined in ways that guarded against unwanted results.

But AI’s inner workings are effectively unknowable, he explained. “There needs to be a much higher focus on exploring the negative space around what you expect the system to do.” That means a testing process that’s independent of the development team.

Bringing in the Red Team

The EO calls explicitly for red team testing of AI — where independent groups attempt to force the application to make errors.

“Red team AI testing refers to a structured effort to identify vulnerabilities and flaws in an AI system. [It’s] based on Cold War-era battle simulations where red teams attack and blue teams defend against intrusion,” Colwell said.

The red team needs to be fully independent from the blue team that built the software. It should have its own project owner or manager and be free of the time pressure applied to the development phase.

Getting Up to Speed

Because AI is so different, even the most experienced software developers will need training to work with it effectively. For example, although IT teams won’t need to know how to train an AI model, they will need to understand how AI training works. Likewise, they’ll need to know about prompts, contextual grounding and other safety features.

And they need to understand the importance of being transparent about how AI is used.

How Tricentis Helps

The Tricentis Continuous Testing Platform includes everything agencies need to support the full testing life cycle across their enterprise application landscape. It covers packaged applications such as SAP and Salesforce and custom and cloud-native applications. It also integrates with open-source and DevOps tools to centrally manage testing activities and connect continuous test automation into delivery pipelines.

This article appeared in our guide, “Agencies of the Future: How to Break Down Barriers to Growth.” For more about how governments are embracing change, download it here:

Get the guide