Starting in the 1950s, early visual processing algorithms were proposed to extract basic information (e.g. color and object boundaries) from images so that machines would have initial visual perception abilities. It was the beginning of a new field within AI, known today as computer vision. Subsequent algorithmic advances included techniques to detect certain patterns (like human faces), and even track the same object across many images. However, these advances were still very limited for enabling true video analytics. But in 2012 it all changed.
Video analytics – the alchemy stage. “We are in the alchemy stage (of computer vision and video analytics) where things work, but we don’t know why… it is the chemistry stage that I am looking forward to” – Bill Freeman.
In 2012 a neural network won by a landslide the most important image recognition challenge. It dominated any other visual perception algorithm known to man at that point. This event triggered the Cambrian explosion of what we, today, refer to as Deep Learning. In its most basic form, deep learning consists of using annotated data to train an (artificial) neural network to perform a task -in this case a visual analytic task like object detection and scene description.
Video analytics – the chemistry stage: Today, algorithm engineering is no longer the critical ingredient in video analytics. (Annotated) data has become the pillar that supports today’s computer vision foundations. This is driving a new set of needs. Successful video analytics will depend on our ability to collect relevant data for the task at hand, the curation and management of such data, as well as excellent habits for conducting data annotation that algorithms learn from. Ecosystems that put data at the center of the development process will succeed.
The next generation of video analytics should be able to discern an event observed across a camera network and around the world, detect events that do not conform to the baseline understanding of the physical world and how objects move, predict future states of the observed world in the same way humans use intuition to predict certain future scenarios, and of course, be able to achieve accuracies comparable to humans for any visual task with minimal training data in always-changing environments.
Even with the advancements in visual processing, there are still areas that need improvement. We are starting to see attacks where neural networks that process images are getting fooled. Videos can be manipulated in imperceptible ways to the human eye but inducing the algorithm to make the wrong prediction. These attacks can distort reality, truth, and numb human judgment long enough to cause damage. For example, next-gen cars will rely on video analytics to perform most of their self-driving functions, if the algorithm misses a stop sign or it confuses it with a green light, the end result could be fatal. In order to address these vulnerabilities, reactive and proactive measures are needed. Reactive measures would include re-training these networks with manipulated images so they learn about these attacks, while proactive would include the detection of manipulated images via specialized parts of the network.
As video cameras become even more ubiquitous, video analytics will be one of the front faces of AI, and thus scrutiny will increase. What it is clear is that as we develop more advanced techniques of video analytics, beyond object detection and motion prediction, use of cameras will be more ubiquitous and video analytics will become highly intertwined with our daily life, habits and conduct. Lots of work, responsibility, and exciting times lie ahead!
Co-authored by Marc Bosch, PhD – Accenture Federal Services Computer vision science director
Dominic Delmolino is a GovLoop Featured Contributor. He is the Chief Technology Officer at Accenture Federal Services and leads the development of Accenture federal’s technology strategy. He has been instrumental in establishing Accenture’s federal activities in the open source space and has played a key role in the business by fostering and facilitating federal communities of practice for cloud, DevOps, artificial intelligence and blockchain. You can read his posts here.