卷积层

     在图像处理中,往往将图像转化为像素,如讲1000*1000的图像转化为1000000的向量。如果假设隐藏层也是为1000000时,则权值参数诶1000000000000个,这种权值参数太多了,无法进行训练,所以需要减少权值参数的个数。一般有两种方法进行减少:

      第一种称为局部感知野。一般认为人对外界的认知时从局部到全局的,而图像的空间联系也是局部像素联系较为紧密。所以每个神经元没有必要对全局进行感知,只需要对局部进行感知,然后由更高层将局部的信息综合起来就得到了全局的信息。假如每个神经元只和10*10个像素值相连,那么权值个数为1000000*100,减少了万分之一,而10*10个像素值对应的10*10个参数,就是卷积操作的卷积核。

      第二种称为参数共享。虽然采用局部感知的方式减少了权值,但是还是太多,这里通过权值共享,即这1000000个神经元的100个参数都是相同的,那么权值参数就变成了100了。

一个实例:

import tensorflow as tf
import numpy as np

image = np.array([[1,1,1,0,0],
                  [0,1,1,1,0],
                  [0,0,1,1,1],
                  [0,0,1,1,0],
                  [0,1,1,0,0]])

weight = np.array([[1,0,1],
                    [0,1,0],
                    [1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,1,1,1], padding="VALID")
with tf.Session() as sess:
    idata = sess.run(input)
    filt = sess.run(filter)
    res = sess.run(result)
    print(res)

输出为1*3*3*1矩阵:

4  3  4

2  4  3

2  3  4

卷积层


此时卷积的padding="VALID", 输出大小计算公式为((width - filter + 2*padding)/stride)+1,  对于VALID,为width/stride  向下取整, 本例中(5-3+0)+1=3,如果收入的width=6,stride=2,则输出等于2.5,实际输出为2.实例如下:

import tensorflow as tf
import numpy as np

image = np.array([[1,1,1,0,0,1],
                  [0,1,1,1,0,1],
                  [0,0,1,1,1,1],
                  [0,0,1,1,0,1],
                  [0,1,1,0,0,1],
                  [1,1,1,1,1,1]])

weight = np.array([[1,0,1],
                    [0,1,0],
                    [1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,6,6,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,2,2,1], padding="VALID")
with tf.Session() as sess:
    idata = sess.run(input)
    filt = sess.run(filter)
    res = sess.run(result)
    print(res)

输出为1*2*2*1矩阵:

4    4

2    4


如果padding设置为SAME,此时输出为width/stride上取整, 则输出为:

import tensorflow as tf
import numpy as np
#convolution padding="SAME"
image = np.array([[1,1,1,0,0],
                  [0,1,1,1,0],
                  [0,0,1,1,1],
                  [0,0,1,1,0],
                  [0,1,1,0,0]])

weight = np.array([[1,0,1],
                    [0,1,0],
                    [1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,1,1,1], padding="SAME")
with tf.Session() as sess:
    idata = sess.run(input)
    filt = sess.run(filter)
    res = sess.run(result)
    print(res)

输出为:1*5*5*1

2  2  3  1  1

1  4  3  4  1

1  2 4  3  1

1  2 3  4  1

0  2  2  1  1


stride=2时:

import tensorflow as tf
import numpy as np
#convolution padding="SAME"
image = np.array([[1,1,1,0,0],
                  [0,1,1,1,0],
                  [0,0,1,1,1],
                  [0,0,1,1,0],
                  [0,1,1,0,0]])

weight = np.array([[1,0,1],
                    [0,1,0],
                    [1,0,1]])
bias = 0
input = tf.constant(image, dtype=tf.float32)
filter = tf.constant(weight, dtype=tf.float32)
input = tf.reshape(input, [1,5,5,1])
filter = tf.reshape(filter, [3,3,1,1])
result = tf.nn.conv2d(input, filter, strides=[1,2,2,1], padding="SAME")
with tf.Session() as sess:
    idata = sess.run(input)
    filt = sess.run(filter)
    res = sess.run(result)
    print(res)

输出1*3*3*1:

2 3 1

1 4 3

0 2 1