阅读笔记:Fwakes
Contents
- Abstract
- 1 Introduction
- 2 Background and Related Work
- 2.1 Protecting Privacy via Evasion Attacks
- 2.2 Protecting Privacy via Poisoning Attacks
- 2.3 Other Related Work
- 3 Protecting Privacy via Cloaking
- 3.1 Assumptions and Threat Model
- 3.2 Overview and Intuition
- 3.3 Computing Cloak Perturbations
- 3.4 Cloaking Effectiveness & Transferability
- 4 The Fawkes Image Cloaking System
- 5 System Evaluation
- 5.1 Experiment Setup
- 5.2 User/Tracker Sharing a Feature Extractor
- 5.3 User/Tracker Using Different Feature Extractor
- 6 Image Cloaking in the Wild
- 7 Trackers with Uncloaked Images Access
- 8 Countermeasures
Abstract
Fakwes, a system that helps individuals inoculate their images against unauthorized facial recognition models.
Method:
- adding imperceptible pixel-level changes (we call them “cloaks”).
Results:
- Fawkes provides 95+% protection against user recognition regardless of how trackers train their models.
- Even when clean, uncloaked images are “leaked” to the tracker and used for training, Fawkes can still maintain an 80+% protection success rate.
- 100% success in experiments against today’s state-of-the-art facial recognition services.
1 Introduction
-
Facial recognition systems are scanning millions of citizens without explicit consent.
-
Anyone can build highly accuracy facial recognition models of us without our knowledge or awareness. (MegaFace)
Protecting people from being identified by unauthorized facial recognition models: distorting, adversarial patches, clean-label poison attacks.
Fawkes adds imperceptible pixel-level changes to inoculate images against unauthorized facial recognition models. If collected and used to train a facial recognition model to recognize the user, these “cloaked” images would produce functional models that consistently misidentify them.
Fawkes takes the user’s photos and computes minimal perturbations that shift them significantly in the feature space of facial recognition model. Any facial recognition model trained using these images of the user learns an altered set of “feature”.
- producing significant alternations to images’ feature space representations using perturbations imperceptible to the naked eye.
- providing 95+% protection.
- 100% success against state-of-the-art facial recognition services.
- 80+% success when half of training images are uncloaked.
- robust to a variety of mechanism for both cloak disruption and cloak detection.
2 Background and Related Work
extended from poisoning attacking in machine learning.
2.1 Protecting Privacy via Evasion Attacks
Aims: making images difficult for a facial recognition model to recognize.
(1). creating adversarial examples, inputs to the model designed to cause misclassification.
-
specially printed glasses
-
adversarial stickers on hat
-
adversarial patched
Limitations: require the user to wear fairly obvious and conspicuous accessories; require full and unrestricted access (white box access) to precise model tracking them, which are easily to broken by updating model.
(2). editing facial images, human-like characteristics are preserved but facial recognition model accuracy is significantly reduced.
-
k-means
-
facial inpainting
-
GAN-based face editing
Limitations: alter the user’s face in the photos.
2.2 Protecting Privacy via Poisoning Attacks
Aims: disrupting training — Data poisoning attacks, modifying the initial data used to train model.
(1) Clean Label Attacks, injects “correctly” labeled poison images into training data, causing a model trained on this data to misclassify a specific image of interest.
- only cause misclassification on a single, preselected image
- does not transfer well to different models
- easily detectable
(2) Model Corruption Attacks, modifying images such that they degrade the accuracy of a model trained on them.
2.3 Other Related Work
Transfer learning uses existing pretrained models as a basis for quickly training model for customized task, using less training data. . Typically, a model can be created by appending a few additional layers to and only training those new layers.
3 Protecting Privacy via Cloaking
Facial recognition models trained on cloaked images will have a distorted view of the user in the “feature space”, i.e. the model’s internal understanding of what makes the user unique.
3.1 Assumptions and Threat Model
Design goals:
- cloaks should be imperceptible and not impact normal use of the image
- when classifying normal, uncloaked images, models trained on cloaked images should recognize the underlying person with low accuracy.
3.2 Overview and Intuition
DNN models are trained to identify and extract (often hidden) features in input data and use them to perform classification. Their ability to identify features is easily disrupted by disruption of input data.
By simply modifying their online photos in small and imperceptible ways, the user successfully prevents unauthorized trackers and their DNN models from recognizing their true face.
3.3 Computing Cloak Perturbations
The goal is making the learned features from cloaked photos highly dissimilar from those learned from original (uncloaked) photos.
Notation:
- : Alice’s uncloaked images
- : target image (image from another class user ) used to generate cloak for Alice
- : cloak computed for Alice’s image based on image from label
- : cloaked version of Alice’s image
- feature extractor used by facial recognition model
- : feature vector (or feature representation) extracted from an input
Cloaking to maximizing feature deviation:
Ideal cloaking design modifies by adding cloak perturbation to that maximizes changes in 's feature representation:
where computes the distance of two feature vectors, measure the perceptual perturbation caused by cloaking, and is the perceptual perturbation budget.
Image-specific Cloaking:
When creating cloaks for her photos, Alice will produce image-specific cloaks, i.e. is image dependent. Eq.(1) is replaced with the following optimization:
Here we search for the cloak for that shifts its feature representation closely towards . This new form of optimization also prevents the system from generating extreme values.
Finally, our image-specific cloak optimization will create different cloak patterns among Alice’s images. This “diversity” makes it hard to detect and remove cloaks.
3.4 Cloaking Effectiveness & Transferability
Now Alice can produce cloaked images whose feature representation is dissimilar from her own but similar to that of a target user .
(1). Effectiveness:
- whether this can translate into the desired misclassification behavior in the tracker model
- whether cloaking still lead to misclassification no matter of T existing in tracker model
Our hypothesis is that as long as the feature representations of Alice’s cloaked and uncloaked images are sufficiently different, the tracker’s model will not classify them as the same class. This is because there will be another user class in the tracker model, whose feature representation is more similar to than . This is reasonable assumption when the tracker’s model targets many users rather than a fewer users.
(2). Transferability:
Above discussion assumes that the user has the same feature extractor as is used to train the tracker model.
Transferability: the property that the models trained for similar tasks share similar properties and vulnerabilities, even they were trained on different architectures and different train data.
The transferability property suggests that cloaking should still be effective because the user’s and the tracker’s feature extractor are designed for similar tasks.
4 The Fawkes Image Cloaking System
Input image set , the feature extractor , and the cloak perturbation budget .
-
Choosing a Target Class T.
Randomly picking candidate target classes and their images from publicly available dataset. Using the feature extractor to calculate centroid of the feature space, . Fawkes picks as the target class the class in the candidate set whose feature representation centroid is most dissimilar from the feature representations of all images in .
where is L2 distance. -
Computing Per-image Cloaks.
Target images set , For each , Fawkes randomly picks an image following eq.(2). In our implementation, is calculated using the DSSIM (Structural Dis-Similarity Index). DSSIM is a measure of user-perceived image distortion.
Applying the penalty method to reformat and solve the optimization in eq.(2):
where controls the impact of the input perturbation caused by cloaking. When , the cloaked image is visually identical to the original image.
5 System Evaluation
Efficacy could drop, but can be restored to near perfection by making the user’s feature extractor robust (via adversarial training) when feature extractor are different.
5.1 Experiment Setup
(1). Feature Extractors.
Training feature extractors using two large ( 500K images) datasets on different model architectures.
- VGGFace2: 3.14M images of 8,631 subjects.
- WebFace: 500,000 images covering roughly 10,000 subjects.
Arthitectures:
- DenseNet-121: 121 layers, 7M parameters.
- InceptionResNet V2: 572 layer, 54M parameters.
(2). Tracker’s Training Datasets.
-
training from scratch:
- VGGFace2.
- WebFace.
-
applying transfer learning:
- PubFig: 5,850 training images and 650 testing images of 65 public figures.
- FaceScrub: 100,000 images of 530 public figures.
Transfer learning: the tracker adds a softmax layer at the end of the feature extractor, and fine-tunes the added layer using above dataset**.**
(3). Cloaking Configuration.
in tracker’s model, e.g. PubFig. from VGGFace2 and WebFace. Computing the cloak for for each given and . Adam optimizer for 1000 iterations with a learning rate of 0.5.
(4). Evaluation Metrics.
- protection success rate, the tracker model’s misclassification rate for clean (uncloaked) images of .
- normal accuracy, the overall classification accuracy of the tracker’s model on users besides .
5.2 User/Tracker Sharing a Feature Extractor
from PubFig or FaceScrub. Computing “cloaks” for a subset of 's images using each of the four feature extractors in Table 1. Performing transfer learning on the same feature extractor (with cloaked images of ). Finally, evaluating whether the tracker model can correctly identify other clean images of it has not seen before.
Cloaking offers perfect protection. Much higher DSSIM value (up to 0.2) are imperceptible to human eye. Finally, the average norm of our cloaks is 5.44.
Feature space representations of the cloaked images are well-aligned with those of the target images, validating the goal of a cloak is to change the image’s feature space representation in the tracker’s model.
A higher density improves cloaking effectiveness.
5.3 User/Tracker Using Different Feature Extractor
While the model transferability property suggests that there are significant similarities in their respective model feature spaces (since both are trained to recognize faces), their differences could still reduce the efficacy of cloaking.
cloaked images (optimized using VGG2-Dense), original images, target images (from PubFig, Web-Incept)
The reduction in cloak effectiveness is obvious. In the tracker’s feature extractor, the cloak “moves” the original image features only slightly towards the target image features.
Linking model robustness and transferability:
An input perturbation’s ability to transfer between models depends on the “robustness” of the feature extractor used to create it. Perturbations generated on more robust models will take on “universal” characteristics that are able to effectively fool other models.
Improving cloak transferability by increasing the user feature extractor’s robustness by applying adversarial training. Training the model on perturbed data to make it less sensitive to similar small perturbations on input. Generating adversarial examples using PGD attack, and training each feature extractor for an additional 10 epochs.
Cloaks generated on robust extractors transfer better than cloaks computed on normal ones.
6 Image Cloaking in the Wild
-
Microsoft Azure Face API:
using transfer learning to train a model user-submitted images.
-
Amazon Rekognition Face Verification:
computing an image similarity score between the queried image and the ground truth images for all labels.
-
Face++ Face Search API:
extremely robust against a variety of attacks.
7 Trackers with Uncloaked Images Access
7.1 Impact of Uncloaked Images
Training a model with both cloaked and uncloaked user images means the model will observe a much larger spread of features all designated as the user. Effects:
- classify both regions of features as the user
- classify both regions and the region between them as the user
- ignore these feature dimensions and identify the user using some alternative features that connect both uncloaked and cloaked versions of the user’s images.
Methods:
- intentionally releasing more cloaked images
- considering the use of a cooperating secondary identity
7.2 Sybil Accounts
The user modifies Sybil images so they occupy the same feature space as a user’s uncloaked images. These Sybil images help confuse a model trained on both Sybil images and uncloaked/cloaked images of a user, increasing the protection success rate.
Because the leaked uncloaked images and Sybil images are close by in their feature space representations, but labeled differently, the tracker model must create additional decision boundaries in the feature space.
is an image from the set of candidates the user obtains (i.e. images generated by a GAN). We create a cloak that minimizes the feature space separation between and the user’s original image . Sybil image .
7.2 Efficacy of Sybil Image
The use of Sybil account significantly improves the protection success rate when an attacker has a small number of original images.
8 Countermeasures
8.1 Cloak Disruption
-
Image Transformation.
aiming to mitigate the impact of small image perturbations. Transforming images in the train dataset before using them for model training.
- image augmentation
- blurring
- adding noise
Image transformation has less impact on cloak.
-
Robust Model.
Improving the robustness of tracker’s model will decrease the protection rate of cloak. But increasing the visibility of cloak perturbation can achieve a higher protection success rate. (DSSIM perturbation is )
8.2 Cloak Detection
Existing poison attack detection assumes that poisoning only affects a small percentage of training images. Fawkes poisons an entire model class, rendering outlier detection useless by removing the correct baseline.
Obtaining both target and cloaked images: Empirically, the L2 feature space distance between the cloaked class and the target class centroid is 3 standard deviations smaller than the mean separation of other classes. Thus, user’s cloaked images can be detected. User can trivially overcome this detection by maintaining separation between cloaked and target images during cloaked optimization.
Obtaining original training images: Running a 2-means clustering on each class feature space, flagging classes with two distinct centroids as potentially cloaked. The distance between the two centroids of a protected user class is 3 standard deviations larger than the average centroid separation in normal classes. Thus, the tracker can use original images to detect the presence of cloaked images. The user can choose a target class that does not create such a large feature space separation.
[1]. Fawkes: Protecting Privacy against Unauthorized Deep Learning Models
[2]. Poisoning attacks on Machine Learning