- Balagopal U. - B.Tech 2013-17
This piece is the result of me following a news story in The Guardian which came out towards the beginning of November 2017. It had an interesting issue : The Google Inception model had mistaken an 3D printed tortoise for a rifle. One may wonder what the big deal is with this. It is common in deep CNN based models where different objects are misclassified. What peaked my interest was something else: It was the first instance I had seen in which adversarial attacks were done in a 3D object. Till now, these were possible only as 2D photos.It was more robust — It wasn’t the regular case where the object was misclassified for a specific angle and lighting. It held for almost all angles and orientations of the 3D object — in this case, a 3D printed tortoise. As a guy interested in AI and deep learning models, I wasn’t satisfied with the short article. I went ahead and dug up the source paper, previous published work and related stuff related to it.Now, while this news story is fast gaining interest,please do note, that the paper which showcased these results is still under review as a conference paper at the International Conference on Learning Representations(ICLR 2018).
So far they have shown a few cases namely :
- A tortoise classified as a rifle.
- A baseball classified as an espresso.
Adversarial inputs are nothing new to this field.These are intentionally designed to cause incorrect outputs. The team uses an Expectation Over Transformation (EOT) algorithm and also describes an end to end process to generate adversarial 3D models in the paper SYNTHESIZING ROBUST ADVERSARIAL EXAMPLES by Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok, Massachusetts Institute of Technology.Earlier, a team had managed to produce 2D images which fool deep neural networks into giving high confidence predictions with high accuracy.
The 2015 paper Deep Neural Networks are easily Fooled: High Confidence Predictions Images by Anh Nguyen, Jason Yosinki and Jeff clune shows ways of generating images which can easily trick pre-trained models into giving high confidence predictions, but in real life is way off. Another example of forcing a false classification is by adding slight pertubations which are impercetible to the human eye. This example comes straight out of an Ian Goodfellows paper : Explaining and Harnessing Adversial Examples. One of the major things to note in the above paper is that the results generalize adversial examples generated on one model also cause similar errors in models with different architectures and training sets. The authors hypothesize a linear view, in which adversarial examples are present in broad subspaces.
Security concerns in Adversial attacks
While stand alone models causing an error in classification might not seem like much now, these create serious secuurity issues in the AI domain when these systems are going to be integrated into real world projects. As the OpenAI blog put it, this is akin to creating optical illusions for machines. Imagine a self-driving car. The simplest attack would be to make the machine think that there is a STOP sign on the road when there is not. There have been works which have generated adversial examples which caused systems to misclassify road sign. The alarming part is that, this is done as a black box attack- The attacker need not know about the fine workings of the underlying system to run these attacks.
Deep reinforcement Learning methods are also vulnerable to these sort of adversial attacks by input pertubations. The paper vulnerability of Deep Reinforcement Learning to Policy Induction Attacks by Vahid Behzadan and Arslan Munir has successfully demonstrated that such attacks can cause changes in the policy manipulation and induction during the learning phase. This method can lead to systems learning faulty logic. While the current demonstrated example goes only as far as to cause an error in a gaming AI
to not see its enemies, imagine what would happen if such manipulation can be done to real life defence systems. The work Adversarial Attacks on Neural Network Policies by Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel also manipulates the system to degrade the performance of an agent playing few Atari Games.
Circling back to our original example of a tortoise misclassified as a rifle, this could easily serve as a diversionary or overloading tactics to automated security systems. This is particularly pertinent in cases where systems can detect guns from camera feeds using deep learning. In fact, this is not a far off fantasy, there is already a publication Automatic Handgun Detection Alarm in Videos Using Deep Learning Roberto Olmos, Siham Tabik, and Francisco
Herrera regarding this.
As penetration of deep learning systems into real world systems increase, adversarial attacks pose an increasing threat and an imminent danger. There have been previous attempts to safeguard against these attacks using : Adversarial Training, Defensive Distillation, Thermometer Encodings etc, but to limited effect. For the time being, I am less concerned about an AI taking over the world. Humans,unlike an AI have a history of being manipulative, vindictive and power hungry. For the time being, it would be a safe bet to put the AI apocalypse scenario on the back burner and focus on real life solvable security threats like Adversarial Attacks.