卷积层
在图像处理中,往往将图像转化为像素,如讲1000*1000的图像转化为1000000的向量。如果假设隐藏层也是为1000000时,则权值参数诶1000000000000个,这种权值参数太多了,无法进行训练,所以需要减少权值参数的个数。一般有两种方法进行减少:
第一种称为局部感知野。一般认为人对外界的认知时从局部到全局的,而图像的空间联系也是局部像素联系较为紧密。所以每个神经元没有必要对全局进行感知,只需要对局部进行感知,然后由更高层将局部的信息综合起来就得到了全局的信息。假如每个神经元只和10*10个像素值相连,那么权值个数为1000000*100,减少了万分之一,而10*10个像素值对应的10*10个参数,就是卷积操作的卷积核。
第二种称为参数共享。虽然采用局部感知的方式减少了权值,但是还是太多,这里通过权值共享,即这1000000个神经元的100个参数都是相同的,那么权值参数就变成了100了。
一个实例:
import tensorflow as tf
import numpy as np
image = np.array([[1,1,1,0,0],
[0,1,1,1,0],
[0,0,1,1,1],
[0,0,1,1,0],
[0,1,1,0,0]])
weight = np.array([[1,0,1],
[0,1,0],
[1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,1,1,1], padding="VALID")
with tf.Session() as sess:
idata = sess.run(input)
filt = sess.run(filter)
res = sess.run(result)
print(res)
输出为1*3*3*1矩阵:
4 3 4
2 4 3
2 3 4
此时卷积的padding="VALID", 输出大小计算公式为((width - filter + 2*padding)/stride)+1, 对于VALID,为width/stride 向下取整, 本例中(5-3+0)+1=3,如果收入的width=6,stride=2,则输出等于2.5,实际输出为2.实例如下:
import tensorflow as tf
import numpy as np
image = np.array([[1,1,1,0,0,1],
[0,1,1,1,0,1],
[0,0,1,1,1,1],
[0,0,1,1,0,1],
[0,1,1,0,0,1],
[1,1,1,1,1,1]])
weight = np.array([[1,0,1],
[0,1,0],
[1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,6,6,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,2,2,1], padding="VALID")
with tf.Session() as sess:
idata = sess.run(input)
filt = sess.run(filter)
res = sess.run(result)
print(res)
输出为1*2*2*1矩阵:
4 4
2 4
如果padding设置为SAME,此时输出为width/stride上取整, 则输出为:
import tensorflow as tf
import numpy as np
#convolution padding="SAME"
image = np.array([[1,1,1,0,0],
[0,1,1,1,0],
[0,0,1,1,1],
[0,0,1,1,0],
[0,1,1,0,0]])
weight = np.array([[1,0,1],
[0,1,0],
[1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,1,1,1], padding="SAME")
with tf.Session() as sess:
idata = sess.run(input)
filt = sess.run(filter)
res = sess.run(result)
print(res)
输出为:1*5*5*1
2 2 3 1 1
1 4 3 4 1
1 2 4 3 1
1 2 3 4 1
0 2 2 1 1
stride=2时:
import tensorflow as tf
import numpy as np
#convolution padding="SAME"
image = np.array([[1,1,1,0,0],
[0,1,1,1,0],
[0,0,1,1,1],
[0,0,1,1,0],
[0,1,1,0,0]])
weight = np.array([[1,0,1],
[0,1,0],
[1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,2,2,1], padding="SAME")
with tf.Session() as sess:
idata = sess.run(input)
filt = sess.run(filter)
res = sess.run(result)
print(res)
输出1*3*3*1:
2 3 1
1 4 3
0 2 1