tensorflow 2.0 神经网络与全连接层 之 损失函数

常用损失函数

  1. MSE
  2. 交叉熵损失
  3. Hinge Loss (支持向量机)
    • imax(0,1yihθ(xi))\sum_i max(0, 1-y_i*h_\theta(x_i))
      tensorflow 2.0 神经网络与全连接层 之 损失函数

MSE

  1. loss=1N(yout)2loss = \frac{1}{N}\sum (y-out)^2 这里 N=BNumOfClassN = B * NumOfClass
  2. L2norm=(yout)2L_{2-norm} = \sqrt{\sum(y-out)^2}
y = tf.constant([1, 2, 3, 0, 2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y, dtype=tf.float32)
out = tf.random.normal([5, 4])

loss1 = tf.reduce_mean(tf.square(y-out))   
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss2 = tf.square(tf.norm(y-out))/(5*4)
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out)) # VS MeanSquaredError is a class
# tf.Tensor(1.4226108, shape=(), dtype=float32)

Cross Entropy

Entropy

一个数据分布

  1. 不确定性
  2. 衡量惊喜度
  3. 低信息量就会有较大的信息
    Entropy=iP(i)logP(i)Entropy = -\sum_iP(i)\log P(i)
a = tf.fill([4], 0.25)
a*tf.math.log(a)/tf.math.log(2.)
# tf.Tensor([-0.5 -0.5 -0.5 -0.5], shape=(4,), dtype=float32)
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(2.0, shape=(), dtype=float32)

a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(1.3567796, shape=(), dtype=float32)

a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(0.24194068, shape=(), dtype=float32)

cross entropy

两个数据分布

  1. H(p,q)=p(x)logq(x)H(p,q) = -\sum p(x)\log q(x)
  2. H(p,q)=H(p)+DKL(pq)H(p,q) = H(p) + D_{KL}(p|q)
    • 对于 p==qp==q
      • Minima: H(p,q)=H(p)H(p,q) = H(p)
    • 对于 p:one-hot encoding 交叉熵退化成 KL Divergence
      • h(p:[0,1,0])=1log1=0h(p:[0,1,0])=-1log1 = 0
      • H([0,1,0],[q0,q1,q2])=0+DKL(pq)=1logq1H([0,1,0], [q_0,q_1,q_2]) = 0 + D_{KL}(p|q)=-1\log q_1

对于二分类:
3. 两个输出单元 softmax
tensorflow 2.0 神经网络与全连接层 之 损失函数
H([0,1,0],[p0,p1,p2])=0+DKL(pq)=1logq1H([0,1,0], [p_0,p_1,p_2]) = 0 + D_{KL}(p|q) = -1\log q_1

  1. 一个输出单元 阈值
    tensorflow 2.0 神经网络与全连接层 之 损失函数
    H(P,Q)=P(cat)logQ(cat)(1P(cat))log(1Q(cat))H(P,Q) = -P(cat) \log Q(cat)-(1-P(cat))\log (1-Q(cat))
    P(dog)=1P(cat)P(dog) = 1-P(cat)
    H(P,Q)=i={cat,dog}P(i)logQ(i)=P(cat)logQ(cat)P(dog)logQ(dog)H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -P(cat)\log Q(cat)-P(dog)\log Q(dog)
    (ylog(p)+(1y)log(1p))-(y \log(p) + (1-y)\log (1-p))

tensorflow 2.0 神经网络与全连接层 之 损失函数
P1=[1,0,0,0,0]P_1 = [1, 0, 0, 0, 0]

  1. Q1=[0.4,0.3,0.05,0.05,0.2]Q_1 = [0.4, 0.3, 0.05, 0.05, 0.2]
    • H(P,Q)=i={cat,dog}P(i)logQ(i)=(1log0.4+0+0+0+0)=0.916H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -(1\log 0.4 + 0 + 0 + 0 + 0)=0.916
  2. Q1=[0.98,0.01,0,0,0.01]Q_1 = [0.98, 0.01, 0, 0, 0.01]
    • H(P,Q)=i={cat,dog}P(i)logQ(i)=1log0.98=0.02H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -1\log 0.98 = 0.02

多分类

  1. one-hot
  2. multi-classification

函数API

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.25, 0.25, 0.25, 0.25])
# tf.Tensor(1.3862944, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.1, 0.8, 0.1])
# tf.Tensor(2.3978953, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.7, 0.1, 0.1])
# tf.Tensor(0.35667497, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

类API

调用 self.__call__() 方法。

tf.losses.CategoricalCrossentropy()([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

二分类

categorical_crossentropy

# class API
tf.losses.CategoricalCrossentropy()([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

# functinal API
tf.losses.categorical_crossentropy([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

binary crossentropy

# class API
print(tf.losses.BinaryCrossentropy()([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

# functinal API
print(tf.losses.binary_crossentropy([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

Why not MSE?

  1. sigmoid + MSE 会产生梯度弥散和爆炸现象
  2. 收敛缓慢
  3. 但是对于 meta learning MSE的表现却较好。

具体问题,具体分析
tensorflow 2.0 神经网络与全连接层 之 损失函数

分类问题网络流程 与 数值的稳定性

tensorflow 2.0 神经网络与全连接层 之 损失函数
为了数值的稳定性, 将 softmax 与 crossentropy 打包

x = tf.random.normal([1, 784])
w = tf.random.normal([784, 2])
b = tf.zeros([2])

logits = [email protected]+b

prob = tf.math.softmax(logits)

print(tf.losses.categorical_crossentropy([0, 1], logits, from_logits=True))  # 数值稳定 推荐
# tf.Tensor([0.], shape=(1,), dtype=float32)
print(tf.losses.categorical_crossentropy([0, 1], prob))   # 不推荐
# tf.Tensor([1.192093e-07], shape=(1,), dtype=float32)