The DeepXplore system has been tested on real-world datasets and the researchers were able to expose thousands of unique incorrect corner-case behaviours.
"Our DeepXplore work proposes the first test coverage metric called 'neuron coverage' to empirically understand if a test input set has provided bad versus good coverage of the decision logic and behaviours of a deep neural network," said Yinzhi Cao, assistant professor of computer science and engineering at LeHigh University in the US.
In addition to introducing neuron coverage as a metric, the researchers also showed how differential testing can be applied to deep learning systems for software testing.
"DeepXplore solves another difficult challenge of requiring many manually labeled test inputs. It does so by cross-checking multiple DNNs and cleverly searching for inputs that lead to inconsistent results from the deep neural networks," said Junfeng Yang, associate professor of computer science at Columbia University in New York. "For instance, given an image captured by a self-driving car camera, if two networks think that the car should turn left and the third thinks that the car should turn right, then a corner-case is likely in the third deep neural network. There is no need for manual labelling to detect this inconsistency."
The team evaluated DeepXplore on real-world datasets including Udacity self-driving car challenge data, image data from ImageNet and MNIST, Android malware data from Drebin, and PDF malware data from Contagio/VirusTotal, and production quality deep neural networks trained on these datasets, such as these ranked top in Udacity self-driving car challenge.
Their results show that DeepXplore found thousands of incorrect corner case behaviours such as self-driving cars crashing into guard rails in 15 state-of-the-art deep learning models with a total of 132, 057 neurons trained on five popular datasets containing around 162 GB of data.
Next: Differential testing