RUN AWAY FROM YOUR TEACHER: A NEW SELF SUPERVISED APPROACH SOLVING THE PUZZLE OF BYOL
motivation
representation collapse
Wang & Isola (2020) has empirically demonstrated that the balance of the alignment loss and the uniformity loss is necessary when learning representations through contrastive method.
experiment
1.We examine the speculation that the performance drop is caused by the representation collapse
2.We first replace the complex structure with linear mapping qw(·) = W(·). This replacement provides a naive solution to representation collapse: W = I.initialize W = I, while it never converges to this apparent collapse.
derivation
BYOL LOSS UPPER BOUNDING
let λ = β/α
Understanding why BYOL works without collapse is approximately equivalent to understanding how minimizing Lcross-model(qw ◦ fθ, fξ) effectively regularizes the alignment loss.
works as well
Removing the predictor,although minimizing fails to yield better representation than the random baseline.it prevents the overly-optimized alignment loss.
1.why linear predictor improve the quality of feature
architecture
UNDERSTANDING BYOL VIA RAFT
proof:
.
(1)
(2)
(3)