Paper Review: Adversarial Examples

1. One pixel attack for fooling deep neural networks

Motivation:
- Generating adversarial images can be formalized as an optimization problem with constraints. We assume an input image can be represented by a vector in which each scalar element represents one pixel. Let $f$ be the target image clas- sifier which receives n-dimensional inputs
- Let $f$ be the target image classifier which receives n-dimensional inputs, $\mathbf{x}=(x_1,...,x_n)$ be the original natural image correctly classified as class $t$ .
- The probability of $\mathbf{x}$ belonging to the class t is therefore $f_t(\mathbf{x})$ .
- The vector $e(\mathbf{x})=(e_1,...,e_n)$ is an additive adver- sarial perturbation according to $\mathbf{x}$ , the target class $adv$ and the limitation of maximum modification $L$ .
- Note that $L$ is always measured by the length of vector $e(\mathbf{x})$ .
- The goal of adversaries in the case of targeted attacks is to find the optimized solution $e(\mathbf{x})^*$ for the following question:

(a) which dimensions that need to be perturbed

(b) the correspond- ing strength of the modification for each dimension

- In our approach, the equation is slightly different:
Paper Review: Adversarial Examples

In the case of one-piexl attack $d=1$ .

Previous works commonly modify a part of all dimensions while in our approach only d dimensions are modified with the other dimensions of $e(\mathbf{x})$ left to zeros.

Do the experiment on three different networks for classification (All Convolution Network, Network in Network, VGG16 Network)

Some results for CIFAR-10 classification

Paper Review: Adversarial Examples

Paper Review: Adversarial Examples

1. One pixel attack for fooling deep neural networks

相关推荐