Deep Learning Specialization 4: Convolutional Neural Networks - Week4

本周的课程还是很有意思的,可以让人知道神奇的照片凡高化是怎么做出来的。在创新地应用上,最重要的还是如何让问题形式化。问题能够定义清楚就成功了一大半。

1. Face Recognition

1.1 定义

(Face) Verification

  • Input image, name/ID
  • Output whether the input image is that of claimed person

(Face) Recognition

  • Has a database of K persons
  • Get an input image
  • Output ID if the image is any of the K persons (or “not recognized”)

难点:One Shot Learning.
Learning from one example to recognize the person again. 只提供一张图片就能完成后续的识别。

1.2 Siamese Network

Learning a similarity function

if d(image1, image2) τ\le \tau, “same”
else d(image1, image2) >τ\gt \tau, “different”

1.2.1 Encoding of image

Deep Learning Specialization 4: Convolutional Neural Networks - Week4

d(x(1),x(2))=f(x(1))f(x(2))22 d(x^{(1)}, x^{(2)}) = \left \| f(x^{(1)})-f(x^{(2)}) \right \|^2_2

1.2.2 Loss

  • Triplet Loss
    APN - Anchor/Positive/Negative
    f(A)f(P)2+αf(A)f(N)2 \left \| f(A) - f(P) \right \|^2 + \alpha \le \left \| f(A) - f(N) \right \|^2
    α\alpha is margin, 能够让正负样本之间的距离更明显。

L(A,P,N)=max(f(A)f(P)2f(A)f(N)2+α,0) \mathcal{L}(A, P, N) = \max ( \left \| f(A) - f(P) \right \|^2 - \left \| f(A) - f(N) \right \|^2 + \alpha, 0 )

J=i=1mL(A(i),P(i),N(i)) J = \sum^m_{i=1} \mathcal{L}(A^{(i)}, P^{(i)}, N^{(i)})
Choose triplets that are “hard” to train on.

  • Binary Classification
    或者说将这个问题转换成一个二分类问题

Deep Learning Specialization 4: Convolutional Neural Networks - Week4

y^=σ(k=1128wif(x(i))kf(x(j))k+b) \hat{y} = \sigma \left( \sum_{k=1}^{128} w_i \left| f(x^{(i)})_k - f(x^{(j)})_k \right | + b \right)

2. Neural Style Transfer

定义: Content/Sytle/Generated image,将C按照S生成G。

Deep Learning Specialization 4: Convolutional Neural Networks - Week4

Cost Function
J(G)=αJcontent(C,G)+βJstyle(S,G) J(G) = \alpha J_\text{content}(C, G) + \beta J_{style}(S, G)
代价函数由两部分构成。
一部分是内容相似的得分:计算隐藏层的activation输出
Jcontent=12a[l](C)a[l][G] J_\text{content} = \frac{1}{2} \left \| a^{[l](C)} - a^{[l][G]} \right \|
一部分是风格流派分:

定义Style:Correlation between activations across channels.
Gkk[l]=i=1nH[l]i=jnW[l]aijk[l]aijk[l] G_{kk'}^{[l]} = \sum_{i=1}^{n_H^{[l]}} \sum_{i=j}^{n_W^{[l]}} a_{ijk}^{[l]}a_{ijk'}^{[l]}
G[l]G^{[l]}nc[l]×nc[l]n_c^{[l]} \times n_c^{[l]}Gram Matrix,k/k’都是channel上的。l表示是在第l层的activation上。然后分别在S和G上计算:
Jstyle[l](S,G)=1(2nH[l]nW[l]nC[l])2G[l][S]G[l][G]F2 J^{[l]}_\text{style}(S, G) = \frac {1} {(2n_H^{[l]}n_W^{[l]}n_C^{[l]})^2} \left \| G^{[l][S]} - G^{[l][G]} \right \|_F^2
最后,在多个隐层上进行计算。

3. What are deep ConvNets learning?

浅层学到的是基础特征:第一层不同的filter就是在提取不同的边特征;

深层学到的是组合程度更强的:比如耳朵、鼻子之类的局部组件。