PyTorch conv2d
CONV2D
CLASS
torch.nn.Conv2d
(in_channels: int, out_channels: int, kernel_size: Union[T, Tuple[T, T]], stride: Union[T, Tuple[T, T]] = 1, padding: Union[T, Tuple[T, T]] = 0, dilation: Union[T, Tuple[T, T]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = ‘zeros’)
Parameters
- in_channels (int) – Number of channels in the input image
- out_channels (int) – Number of channels produced by the convolution
- kernel_size (int or tuple) – Size of the convolving kernel
- stride (int or tuple, optional) – Stride of the convolution. Default: 1
- padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
-
padding_mode (string*,* optional) –
'zeros'
,'reflect'
,'replicate'
or'circular'
. Default:'zeros'
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
-
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
Shpae
-
输入: ( N , C i n , H i n , W i n ) (N,C_{in},H_{in},W_{in}) (N,Cin,Hin,Win) N N N 是 batch size, C i n C_{in} Cin 是输入的通道数量, H i n H_{in} Hin 是输入的二维信号的高度, W i n W_{in} Win是输入的二维信号的宽度。
-
输出: ( N , C o u t , H o u t , W o u t ) (N,C_{out},H_{out},W_{out}) (N,Cout,Hout,Wout)
H o u t = ⌊ H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 s t r i d e [ 0 ] + 1 ⌋ W o u t = ⌊ W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 s t r i d e [ 1 ] + 1 ⌋ H_{out}=\lfloor \frac{H_{in}+2\times padding[0]-dilation[0]\times (kernel\_size[0]-1)-1}{stride[0]}+1 \rfloor \\ W_{out}=\lfloor \frac{W_{in}+2\times padding[1]-dilation[1]\times (kernel\_size[1]-1)-1}{stride[1]}+1 \rfloor Hout=⌊stride[0]Hin+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1+1⌋Wout=⌊stride[1]Win+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1+1⌋N N N 是 batch size, C o u t C_{out} Cout 是输出的通道数量, H o u t H_{out} Hout 是输出的二维信号的高度, W o u t W_{out} Wout是输出的二维信号的宽度。
Variables
-
Conv2d.weight (Tensor) 模型的需要学习的权重的形状大小为: ( o u t _ c h a n n e l s , i n _ c h a n n e l s g r o u p s , k e r n e l _ s i z e [ 0 ] , k e r n e l _ s i z e [ 1 ] ) (out\_channels,\frac{in\_channels}{groups},kernel\_size[0],kernel\_size[1]) (out_channels,groupsin_channels,kernel_size[0],kernel_size[1]). 该权重的值的话是从均匀分布 U ( − k , k ) \mathcal{U}(-\sqrt{k},\sqrt{k}) U(−k ,k )采样得到的,其中 k = g r o u p s C i n × k e r n e l _ s i z e [ 0 ] × k e r n e l _ s i z e [ 1 ] k=\frac{groups}{C_{in}\times kernel\_size[0]\times kernel\_size[1]} k=Cin×kernel_size[0]×kernel_size[1]groups.
-
conv2d.bias(Tensor) 模型的需要学习的偏置的形状大小为 o u t _ c h a n n e l s out\_channels out_channels. 若 bias是True 的话,那么其值是从均匀分布 U ( − k , k ) \mathcal{U}(-\sqrt{k},\sqrt{k}) U(−k ,k )采样得到的,其中 k = g r o u p s C i n × k e r n e l _ s i z e [ 0 ] × k e r n e l _ s i z e [ 1 ] k=\frac{groups}{C_{in}\times kernel\_size[0]\times kernel\_size[1]} k=Cin×kernel_size[0]×kernel_size[1]groups.
自我理解
二维卷积可以认为是对多通道的图像做的卷积。
输出的维度是out_channels,其卷积操作是out_channels个 kernel_size[0]*kernel_size[1] 大小的卷积核分别按步长等参数对整个图像做卷积。