5.5 损失函数

常用损失函数
MSE
Cross Entropy

Entropy
cross entropy
多分类

函数API
类API

二分类

Why not MSE？
分类问题网络流程与数值的稳定性

常用损失函数

MSE
交叉熵损失
Hinge Loss （支持向量机）
- $\sum_i max(0, 1-y_i*h_\theta(x_i))$

MSE

$loss = \frac{1}{N}\sum (y-out)^2$ 这里 $N = B * NumOfClass$
$L_{2-norm} = \sqrt{\sum(y-out)^2}$

y = tf.constant([1, 2, 3, 0, 2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y, dtype=tf.float32)
out = tf.random.normal([5, 4])

loss1 = tf.reduce_mean(tf.square(y-out))   
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss2 = tf.square(tf.norm(y-out))/(5*4)
# tf.Tensor(1.4226108, shape=(), dtype=float32)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out)) # VS MeanSquaredError is a class
# tf.Tensor(1.4226108, shape=(), dtype=float32)

Cross Entropy

Entropy

一个数据分布

不确定性
衡量惊喜度
低信息量就会有较大的信息
$Entropy = -\sum_iP(i)\log P(i)$

a = tf.fill([4], 0.25)
a*tf.math.log(a)/tf.math.log(2.)
# tf.Tensor([-0.5 -0.5 -0.5 -0.5], shape=(4,), dtype=float32)
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(2.0, shape=(), dtype=float32)

a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(1.3567796, shape=(), dtype=float32)

a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))
# tf.Tensor(0.24194068, shape=(), dtype=float32)

cross entropy

两个数据分布

$H(p,q) = -\sum p(x)\log q(x)$
$H(p,q) = H(p) + D_{KL}(p|q)$
- 对于 $p==q$ 时
  - Minima: $H(p,q) = H(p)$
- 对于 p：one-hot encoding 交叉熵退化成 KL Divergence
  - $h(p:[0,1,0])=-1log1 = 0$
  - $H([0,1,0], [q_0,q_1,q_2]) = 0 + D_{KL}(p|q)=-1\log q_1$

对于二分类：
3. 两个输出单元 softmax
tensorflow 2.0 神经网络与全连接层之损失函数
$H([0,1,0], [p_0,p_1,p_2]) = 0 + D_{KL}(p|q) = -1\log q_1$

一个输出单元阈值

$H(P,Q) = -P(cat) \log Q(cat)-(1-P(cat))\log (1-Q(cat))$
$P(dog) = 1-P(cat)$
$H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -P(cat)\log Q(cat)-P(dog)\log Q(dog)$
$-(y \log(p) + (1-y)\log (1-p))$

tensorflow 2.0 神经网络与全连接层之损失函数
$P_1 = [1, 0, 0, 0, 0]$

$Q_1 = [0.4, 0.3, 0.05, 0.05, 0.2]$
- $H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -(1\log 0.4 + 0 + 0 + 0 + 0)=0.916$
$Q_1 = [0.98, 0.01, 0, 0, 0.01]$
- $H(P,Q) = -\sum_{i = \{cat,dog\}} P(i) \log Q(i) = -1\log 0.98 = 0.02$

多分类

one-hot
multi-classification

函数API

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.25, 0.25, 0.25, 0.25])
# tf.Tensor(1.3862944, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.1, 0.8, 0.1])
# tf.Tensor(2.3978953, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.7, 0.1, 0.1])
# tf.Tensor(0.35667497, shape=(), dtype=float32)
tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

类API

调用 self.__call__() 方法。

tf.losses.CategoricalCrossentropy()([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

二分类

categorical_crossentropy

# class API
tf.losses.CategoricalCrossentropy()([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

# functinal API
tf.losses.categorical_crossentropy([0, 1], [0.03, 0.97])
# tf.Tensor(0.030459179, shape=(), dtype=float32)

binary crossentropy

# class API
print(tf.losses.BinaryCrossentropy()([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

# functinal API
print(tf.losses.binary_crossentropy([1], [0.97]))
# tf.Tensor(0.030459056, shape=(), dtype=float32)

Why not MSE？

sigmoid + MSE 会产生梯度弥散和爆炸现象
收敛缓慢
但是对于 meta learning MSE的表现却较好。

具体问题，具体分析
tensorflow 2.0 神经网络与全连接层之损失函数

分类问题网络流程与数值的稳定性

tensorflow 2.0 神经网络与全连接层之损失函数
为了数值的稳定性，将 softmax 与 crossentropy 打包

x = tf.random.normal([1, 784])
w = tf.random.normal([784, 2])
b = tf.zeros([2])

logits = [email protected]+b

prob = tf.math.softmax(logits)

print(tf.losses.categorical_crossentropy([0, 1], logits, from_logits=True))  # 数值稳定 推荐
# tf.Tensor([0.], shape=(1,), dtype=float32)
print(tf.losses.categorical_crossentropy([0, 1], prob))   # 不推荐
# tf.Tensor([1.192093e-07], shape=(1,), dtype=float32)

tensorflow 2.0 神经网络与全连接层 之 损失函数