Paper Review: Adversarial Examples

1. One pixel attack for fooling deep neural networks

  • Motivation:
    - Generating adversarial images can be formalized as an optimization problem with constraints. We assume an input image can be represented by a vector in which each scalar element represents one pixel. Let ff be the target image clas- sifier which receives n-dimensional inputs
    - Let ff be the target image classifier which receives n-dimensional inputs, x=(x1,...,xn)\mathbf{x}=(x_1,...,x_n) be the original natural image correctly classified as class tt.
    - The probability of x\mathbf{x} belonging to the class t is therefore ft(x)f_t(\mathbf{x}).
    - The vector e(x)=(e1,...,en)e(\mathbf{x})=(e_1,...,e_n) is an additive adver- sarial perturbation according to x\mathbf{x}, the target class advadv and the limitation of maximum modification LL.
    - Note that LL is always measured by the length of vector e(x)e(\mathbf{x}).
    - The goal of adversaries in the case of targeted attacks is to find the optimized solution e(x)e(\mathbf{x})^* for the following question:
    Paper Review: Adversarial Examples
  • (a) which dimensions that need to be perturbed
  • (b) the correspond- ing strength of the modification for each dimension

- In our approach, the equation is slightly different:
Paper Review: Adversarial Examples

  • In the case of one-piexl attack d=1d=1.
  • Previous works commonly modify a part of all dimensions while in our approach only d dimensions are modified with the other dimensions of e(x)e(\mathbf{x}) left to zeros.
  • Do the experiment on three different networks for classification (All Convolution Network, Network in Network, VGG16 Network)
  • Some results for CIFAR-10 classification

Paper Review: Adversarial Examples

Paper Review: Adversarial Examples