Paper Review: Adversarial Examples
1. One pixel attack for fooling deep neural networks
- Motivation:
- Generating adversarial images can be formalized as an optimization problem with constraints. We assume an input image can be represented by a vector in which each scalar element represents one pixel. Let be the target image clas- sifier which receives n-dimensional inputs
- Let be the target image classifier which receives n-dimensional inputs, be the original natural image correctly classified as class .
- The probability of belonging to the class t is therefore .
- The vector is an additive adver- sarial perturbation according to , the target class and the limitation of maximum modification .
- Note that is always measured by the length of vector .
- The goal of adversaries in the case of targeted attacks is to find the optimized solution for the following question:
- (a) which dimensions that need to be perturbed
- (b) the correspond- ing strength of the modification for each dimension
- In our approach, the equation is slightly different:
- In the case of one-piexl attack .
- Previous works commonly modify a part of all dimensions while in our approach only d dimensions are modified with the other dimensions of left to zeros.
- Do the experiment on three different networks for classification (All Convolution Network, Network in Network, VGG16 Network)
- Some results for CIFAR-10 classification