CUDA学习(6)Kernel的加载-threadIdx

刚开始学习CUDA的时候,对kernel加载的计算idx一直很模糊,threadIdx.x,blockx.x,blockDim,gridDim等一直分不清。经过查阅各方资料,特在此做个整理,表述一下个人理解。
1. Grid,Block,Thread三关系
CUDA学习(6)Kernel的加载-threadIdx
从图中我们可以看出,一个Grid里可以包含多个Block,一个Block里包含多个Thread。这三者的组成方式都可以是一维、二维、三维的。在CUDA程序中每个线程的ThreadIdx在任何时刻都是唯一的。
2. 维度
启动kernel时,需要制定gridsize和blocksize
dim3 gridsize(x,y,z)
dim3 blocksize(x,y,z)
blockDim.x,blockDim.y,blockDim.z分别代表Block 在x,y,z三个方向的深度。Dim 数从1开始标,线程数Idx从0开始标。
3. 1D、2D、3D模式
3.1 1D模式
grid 1D,Block 1D(grid划分成1维,block划分成1维)
加载方式 int idx = blockIdx.x *blockDim.x + threadIdx.x;
Kernel<<< numBlock,threadsPerBlock>>>(argv)

grid 1D,Block 2D (grid划分成1维,block划分成2维)
int idx = blockIdx.x * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x;
dim3 dimBlock(x,y)
Kernel<<< numBlock,dimBlock>>>(argv)
以此为例,因为grid是一维的,所以blockIdx.x(从0开始标号)就是一个grid中含有的Block的数目-1;blockDim.x是一个block中x方向的线程数目,blockDim.y是一个block中y方向的线程数目,blockDim.x*blockDim.y就是一个Block中所含有的线程数, blockIdx.x * blockDim.x * blockDim.y 就是一个grid中所有满线程的Block中所含有的线程总数。接下来我们看最后一个Block的情况,因为Block是二维的,所以threadIdx.y * blockDim.x就是满x的线程数,threadIdx.x是最后一行的线程数。三者相加就是所有线程数。

grid 1D,Block 3D
int idx = blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
dim3 dimBlock(x,y,z)
Kernel<<< numBlock,dimBlock>>>(argv)

grid 2D,Block 1D (grid划分成2维,block划分成1维)
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * blockDim.x + threadIdx.x;
dim3 dimGrid(x,y);
Kernel<<< dimGrid,threadsPerBlock>>>(argv);

grid 2D,Block 2D
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 2D,Block 3D
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x)+ threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 3D,Block 1D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x+ gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * blockDim.x + threadIdx.x;
dim3 dimGrid(x,y,z);
Kernel<<< dimGrid,threadsPerBlock>>>(argv);

grid 3D,Block 2D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * (blockDim.x * blockDim.y)+ (threadIdx.y * blockDim.x) + threadIdx.x;
dim3 dimGrid(x1,y1,z1),dimBlock(x2,y2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 3D,block 3D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x+ gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x)+ threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2);
Kernel<<< dimGrid,dimBlock>>>(argv);