查看GPU连续使用情况 windows

命令行模式下进入C:\Program Files\NVIDIA Corporation\NVSMI

使用： nvidia-smi.exe -l 必须是小写的l， -l 指定数字表示每个几秒刷新显示

CUDA

基础信息

CUDA（Compute Unified Device Architecture），是显卡厂商NVIDIA推出的运算平台。

显存（Global Memory）：显存是在GPU板卡上的DRAM。

计算单元（Streaming Multiprocessor）：执行计算的。每一个SM都有自己的控制单元（Control Unit）、寄存器（Register）、缓存（Cache）、指令流水线（execution pipelines）。

CUDA线程分成Grid和Block两个层次。

Grid：由一个单独的Kernel启动的所有线程组成一个Grid，Grid中所有线程共享显存。

一个Grid由多个Block组成。

Block：同一个Block中的线程可以同步，也可以通过shared memory通信

一个Block由多个线程组成。

Grid和Block都可以是一维、二维或者三维。

CUDA内置变量：

　　blockIdx：block的索引。

　　threadIdx：线程索引。

　　blockDim：block维度.

　　gridDim：grid维度。

Warp：A warp is a set of 32 threads within a thread block such that all the threads in a warp execute the same instruction.

CUDA Streaming Multiprocessor的基本执行单元，一个warp包含32个并行线程。每个线程块可以包含多个warp。

CUDA Scan（扫描）

求数组的前缀和（包括inclusive scan 和exclusive scan两种方式）。

假设输入数组为input，输出数组为output，那么应该有output[i] = output[i-1] + in[i]；对于串行算法，时间复杂度为O(n^2)，对于并行算法，又分为 Hillis and Steele scan和Blelloch scan

记录踩过的坑-GPU

computeMode

computeMode is the compute mode that the device is currently in.

Available modes are as follows:

cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use cudaSetDevice() with this device.

cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use cudaSetDevice() with this device.

cudaComputeModeProhibited: Compute-prohibited mode - No threads can use cudaSetDevice() with this device. Any errors from calling cudaSetDevice() with an exclusive (and occupied) or prohibited device will only show up after a non-device management runtime function is called. At that time, cudaErrorNoDevice will be returned.

NVCC

-gencode:

arch参数是应用程序所需的最小计算体系结构，也是NVCC的JIT编译器将编译PTX代码的最小设备计算体系结构

code参数是NVCC完全编译应用程序的计算架构，因此不需要JIT编译