阅读笔记:Fwakes

Abstract

Fakwes, a system that helps individuals inoculate their images against unauthorized facial recognition models.

Method:

  • adding imperceptible pixel-level changes (we call them “cloaks”).

Results:

  • Fawkes provides 95+% protection against user recognition regardless of how trackers train their models.
  • Even when clean, uncloaked images are “leaked” to the tracker and used for training, Fawkes can still maintain an 80+% protection success rate.
  • 100% success in experiments against today’s state-of-the-art facial recognition services.

1 Introduction

  • Facial recognition systems are scanning millions of citizens without explicit consent.

  • Anyone can build highly accuracy facial recognition models of us without our knowledge or awareness. (MegaFace)

Protecting people from being identified by unauthorized facial recognition models: distorting, adversarial patches, clean-label poison attacks.

Fawkes adds imperceptible pixel-level changes to inoculate images against unauthorized facial recognition models. If collected and used to train a facial recognition model to recognize the user, these “cloaked” images would produce functional models that consistently misidentify them.

Fawkes takes the user’s photos and computes minimal perturbations that shift them significantly in the feature space of facial recognition model. Any facial recognition model trained using these images of the user learns an altered set of “feature”.

  • producing significant alternations to images’ feature space representations using perturbations imperceptible to the naked eye.
  • providing 95+% protection.
  • 100% success against state-of-the-art facial recognition services.
  • 80+% success when half of training images are uncloaked.
  • robust to a variety of mechanism for both cloak disruption and cloak detection.

2 Background and Related Work

extended from poisoning attacking in machine learning.

2.1 Protecting Privacy via Evasion Attacks

Aims: making images difficult for a facial recognition model to recognize.

(1). creating adversarial examples, inputs to the model designed to cause misclassification.

  • specially printed glasses

  • adversarial stickers on hat

  • adversarial patched

    Limitations: require the user to wear fairly obvious and conspicuous accessories; require full and unrestricted access (white box access) to precise model tracking them, which are easily to broken by updating model.

(2). editing facial images, human-like characteristics are preserved but facial recognition model accuracy is significantly reduced.

  • k-means

  • facial inpainting

  • GAN-based face editing

    Limitations: alter the user’s face in the photos.

2.2 Protecting Privacy via Poisoning Attacks

Aims: disrupting training — Data poisoning attacks, modifying the initial data used to train model.

(1) Clean Label Attacks, injects “correctly” labeled poison images into training data, causing a model trained on this data to misclassify a specific image of interest. (x,y)(x,y)(x,y)\to(x,y')

  • only cause misclassification on a single, preselected image
  • does not transfer well to different models
  • easily detectable

(2) Model Corruption Attacks, modifying images such that they degrade the accuracy of a model trained on them.

2.3 Other Related Work

Transfer learning uses existing pretrained models as a basis for quickly training model for customized task, using less training data. ΦFθ\Phi \to \mathbb{F}_{\theta}. Typically, a model Fθ\mathbb{F}_{\theta} can be created by appending a few additional layers to Φ\Phi and only training those new layers.

3 Protecting Privacy via Cloaking

Facial recognition models trained on cloaked images will have a distorted view of the user in the “feature space”, i.e. the model’s internal understanding of what makes the user unique.
阅读笔记:Fwakes

3.1 Assumptions and Threat Model

Design goals:

  • cloaks should be imperceptible and not impact normal use of the image
  • when classifying normal, uncloaked images, models trained on cloaked images should recognize the underlying person with low accuracy.

3.2 Overview and Intuition

DNN models are trained to identify and extract (often hidden) features in input data and use them to perform classification. Their ability to identify features is easily disrupted by disruption of input data.

By simply modifying their online photos in small and imperceptible ways, the user successfully prevents unauthorized trackers and their DNN models from recognizing their true face.

3.3 Computing Cloak Perturbations

The goal is making the learned features from cloaked photos highly dissimilar from those learned from original (uncloaked) photos.

Notation:

  • xx: Alice’s uncloaked images
  • xTx_T: target image (image from another class user TT) used to generate cloak for Alice xx
  • δ(x,xT)\delta(x,x_T): cloak computed for Alice’s image xx based on image xTx_T from label TT
  • xδ(x,xT)x\oplus \delta(x,x_T): cloaked version of Alice’s image xx
  • Φ\Phi feature extractor used by facial recognition model
  • Φ(x)\Phi(x): feature vector (or feature representation) extracted from an input xx

Cloaking to maximizing feature deviation:

Ideal cloaking design modifies xx by adding cloak perturbation δ(x,xT)\delta(x,x_T) to xx that maximizes changes in xx's feature representation:
maxδDist(Φ(x),Φ(xδ(x,xT))),subject to δ(x,xT)<ρ \max_{\delta}Dist(\Phi(x),\Phi(x\oplus\delta(x,x_T))),\\ subject\ to\ |\delta(x,x_T)|<\rho
where Dist()Dist(\cdot) computes the distance of two feature vectors, δ|\delta| measure the perceptual perturbation caused by cloaking, and ρ\rho is the perceptual perturbation budget.

Image-specific Cloaking:

When creating cloaks for her photos, Alice will produce image-specific cloaks, i.e. δ(x,xT)\delta(x,x_T) is image dependent. Eq.(1) is replaced with the following optimization:
minδDist(Φ(xT),Φ(xδ(x,xT)))subject to δ(xmxT)<ρ \min_{\delta}Dist(\Phi(x_T),\Phi(x\oplus \delta(x,x_T))) \\ subject\ to\ |\delta(xmx_T)|<\rho
Here we search for the cloak for xx that shifts its feature representation closely towards xTx_T. This new form of optimization also prevents the system from generating extreme values.

Finally, our image-specific cloak optimization will create different cloak patterns among Alice’s images. This “diversity” makes it hard to detect and remove cloaks.

3.4 Cloaking Effectiveness & Transferability

Now Alice can produce cloaked images whose feature representation is dissimilar from her own but similar to that of a target user TT.

(1). Effectiveness:

  • whether this can translate into the desired misclassification behavior in the tracker model
  • whether cloaking still lead to misclassification no matter of T existing in tracker model
    阅读笔记:Fwakes

Our hypothesis is that as long as the feature representations of Alice’s cloaked and uncloaked images are sufficiently different, the tracker’s model will not classify them as the same class. This is because there will be another user class in the tracker model, whose feature representation is more similar to Φ(x)\Phi(x) than Φ(xδ)\Phi(x\oplus\delta). This is reasonable assumption when the tracker’s model targets many users rather than a fewer users.

(2). Transferability:

Above discussion assumes that the user has the same feature extractor Φ\Phi as is used to train the tracker model.

Transferability: the property that the models trained for similar tasks share similar properties and vulnerabilities, even they were trained on different architectures and different train data.

The transferability property suggests that cloaking should still be effective because the user’s and the tracker’s feature extractor are designed for similar tasks.

4 The Fawkes Image Cloaking System

Input image set XU\boldsymbol{X_U}, the feature extractor Φ\Phi, and the cloak perturbation budget ρ\rho.

  1. Choosing a Target Class T.

    Randomly picking KK candidate target classes and their images from publicly available dataset. Using the feature extractor Φ\Phi to calculate centroid of the feature space, Ck\mathcal{C}_k. Fawkes picks as the target class TT the class in the KK candidate set whose feature representation centroid is most dissimilar from the feature representations of all images in XU\boldsymbol{X_U}.
    T=argmaxk=1,,KminxXUDist(Φ(x),Ck) T=\mathop{\arg\max}_{k=1,\cdots,K}\min_{x\in{\boldsymbol{X_U}}}Dist(\Phi(x),\mathcal{C}_k)
    where Dist()Dist(\cdot) is L2 distance.

  2. Computing Per-image Cloaks.

    Target images set XT\boldsymbol{X_T}, For each xXUx\in\boldsymbol{X_U}, Fawkes randomly picks an image xTXTx_T\in\boldsymbol{X_T} following eq.(2). In our implementation, δ(x,xT)|\delta(x,x_T)| is calculated using the DSSIM (Structural Dis-Similarity Index). DSSIM is a measure of user-perceived image distortion.

    Applying the penalty method to reformat and solve the optimization in eq.(2):
    minδDist(Φ(xT),Φ(xδ(x,xT)))+λmax(δ(x,xT)ρ, 0) \min_{\delta}Dist(\Phi(x_T),\Phi(x\oplus\delta(x,x_T)))+\lambda\cdot\max(|\delta(x,x_T)|-\rho,\ 0)
    where λ\lambda controls the impact of the input perturbation caused by cloaking. When λ\lambda\to\infty, the cloaked image is visually identical to the original image.

5 System Evaluation

Efficacy could drop, but can be restored to near perfection by making the user’s feature extractor robust (via adversarial training) when feature extractor are different.

5.1 Experiment Setup

(1). Feature Extractors.

Training feature extractors using two large (\ge 500K images) datasets on different model architectures.

  • VGGFace2: 3.14M images of 8,631 subjects.
  • WebFace: 500,000 images covering roughly 10,000 subjects.

阅读笔记:Fwakes
Arthitectures:

  • DenseNet-121: 121 layers, 7M parameters.
  • InceptionResNet V2: 572 layer, 54M parameters.

(2). Tracker’s Training Datasets.

  1. training from scratch:

    • VGGFace2.
    • WebFace.
  2. applying transfer learning:

    • PubFig: 5,850 training images and 650 testing images of 65 public figures.
    • FaceScrub: 100,000 images of 530 public figures.

阅读笔记:Fwakes

Transfer learning: the tracker adds a softmax layer at the end of the feature extractor, and fine-tunes the added layer using above dataset**.**

(3). Cloaking Configuration.

UU in tracker’s model, e.g. PubFig. TT from VGGFace2 and WebFace. Computing the cloak for xx for each given UU and TT. Adam optimizer for 1000 iterations with a learning rate of 0.5.

(4). Evaluation Metrics.

  • protection success rate, the tracker model’s misclassification rate for clean (uncloaked) images of UU.
  • normal accuracy, the overall classification accuracy of the tracker’s model on users besides UU.

5.2 User/Tracker Sharing a Feature Extractor

UU from PubFig or FaceScrub. Computing “cloaks” for a subset of UU's images using each of the four feature extractors in Table 1. Performing transfer learning on the same feature extractor (with cloaked images of UU). Finally, evaluating whether the tracker model can correctly identify other clean images of UU it has not seen before.

阅读笔记:Fwakes

Cloaking offers perfect protection. Much higher DSSIM value (up to 0.2) are imperceptible to human eye. Finally, the average L2L2 norm of our cloaks is 5.44.

阅读笔记:Fwakes

Feature space representations of the cloaked images are well-aligned with those of the target images, validating the goal of a cloak is to change the image’s feature space representation in the tracker’s model.

阅读笔记:Fwakes

A higher density improves cloaking effectiveness.

5.3 User/Tracker Using Different Feature Extractor

While the model transferability property suggests that there are significant similarities in their respective model feature spaces (since both are trained to recognize faces), their differences could still reduce the efficacy of cloaking.

cloaked images (optimized using VGG2-Dense), original images, target images (from PubFig, Web-Incept)

阅读笔记:Fwakes

The reduction in cloak effectiveness is obvious. In the tracker’s feature extractor, the cloak “moves” the original image features only slightly towards the target image features.

Linking model robustness and transferability:

An input perturbation’s ability to transfer between models depends on the “robustness” of the feature extractor used to create it. Perturbations generated on more robust models will take on “universal” characteristics that are able to effectively fool other models.

Improving cloak transferability by increasing the user feature extractor’s robustness by applying adversarial training. Training the model on perturbed data to make it less sensitive to similar small perturbations on input. Generating adversarial examples using PGD attack, and training each feature extractor for an additional 10 epochs.

阅读笔记:Fwakes

Cloaks generated on robust extractors transfer better than cloaks computed on normal ones.

6 Image Cloaking in the Wild

阅读笔记:Fwakes

  • Microsoft Azure Face API:

    using transfer learning to train a model user-submitted images.

  • Amazon Rekognition Face Verification:

    computing an image similarity score between the queried image and the ground truth images for all labels.

  • Face++ Face Search API:

    extremely robust against a variety of attacks.

7 Trackers with Uncloaked Images Access

7.1 Impact of Uncloaked Images

Training a model with both cloaked and uncloaked user images means the model will observe a much larger spread of features all designated as the user. Effects:

  1. classify both regions of features as the user
  2. classify both regions and the region between them as the user
  3. ignore these feature dimensions and identify the user using some alternative features that connect both uncloaked and cloaked versions of the user’s images.

阅读笔记:Fwakes

Methods:

  • intentionally releasing more cloaked images
  • considering the use of a cooperating secondary identity

7.2 Sybil Accounts

The user modifies Sybil images so they occupy the same feature space as a user’s uncloaked images. These Sybil images help confuse a model trained on both Sybil images and uncloaked/cloaked images of a user, increasing the protection success rate.

阅读笔记:Fwakes

Because the leaked uncloaked images and Sybil images are close by in their feature space representations, but labeled differently, the tracker model must create additional decision boundaries in the feature space.

xCx_C is an image from the set of candidates the user obtains (i.e. images generated by a GAN). We create a cloak δ(xC,x)\delta(x_C,x) that minimizes the feature space separation between xCx_C and the user’s original image xx. Sybil image xs=xCδ(xC,x)x_s=x_C\oplus\delta(x_C,x).

7.2 Efficacy of Sybil Image

阅读笔记:Fwakes

The use of Sybil account significantly improves the protection success rate when an attacker has a small number of original images.

8 Countermeasures

8.1 Cloak Disruption

  1. Image Transformation.

    aiming to mitigate the impact of small image perturbations. Transforming images in the train dataset before using them for model training.

    • image augmentation
    • blurring
    • adding noise

    Image transformation has less impact on cloak.

  2. Robust Model.

    Improving the robustness of tracker’s model will decrease the protection rate of cloak. But increasing the visibility of cloak perturbation can achieve a higher protection success rate. (DSSIM perturbation is >0.01>0.01)

8.2 Cloak Detection

Existing poison attack detection assumes that poisoning only affects a small percentage of training images. Fawkes poisons an entire model class, rendering outlier detection useless by removing the correct baseline.

Obtaining both target and cloaked images: Empirically, the L2 feature space distance between the cloaked class and the target class centroid is 3 standard deviations smaller than the mean separation of other classes. Thus, user’s cloaked images can be detected. User can trivially overcome this detection by maintaining separation between cloaked and target images during cloaked optimization.

Obtaining original training images: Running a 2-means clustering on each class feature space, flagging classes with two distinct centroids as potentially cloaked. The distance between the two centroids of a protected user class is 3 standard deviations larger than the average centroid separation in normal classes. Thus, the tracker can use original images to detect the presence of cloaked images. The user can choose a target class that does not create such a large feature space separation.

[1]. Fawkes: Protecting Privacy against Unauthorized Deep Learning Models
[2]. Poisoning attacks on Machine Learning