从cmake测试存在cuda GPU的最简单方法是什么?

问题描述:

我们有一些夜间制造机器,它们安装了cuda libraries,但没有安装支持cuda的GPU。这些机器能够构建启用cuda的程序,但它们无法运行这些程序。从cmake测试存在cuda GPU的最简单方法是什么?

在我们的夜间自动生成过程中,我们的CMake的脚本中使用的cmake命令

find_package(CUDA)

,以确定是否已安装CUDA软件。这会在安装了cuda软件的平台上设置cmake变量CUDA_FOUND。这是伟大的,它完美的作品。当设置了CUDA_FOUND时,可以构建启用cuda的程序。即使机器没有支持cuda的GPU。

但是,使用cuda的测试程序在非GPU cuda机器上自然失败,导致我们的夜间仪表板看起来“脏”。所以我希望cmake避免在这些机器上运行这些测试。但我仍然希望在这些机器上构建cuda软件。

得到肯定的CUDA_FOUND结果之后,我想测试一个实际的GPU的存在,然后设置一个变量,说CUDA_GPU_FOUND,以反映这一点。

什么是最简单的方法让cmake测试存在的cuda功能的GPU?

这需要在三个平台上工作:Windows与MSVC,Mac和Linux。 (这就是为什么我们首先使用cmake)

编辑:关于如何编写一个程序来测试GPU的存在,有几个好看的建议。仍然缺少的是让CMake在配置时编译和运行该程序的方法。我怀疑CMake中的TRY_RUN命令在这里很重要,但不幸的是,命令是nearly undocumented,我无法弄清楚如何使它工作。这个CMake问题的一部分可能是一个更难的问题。也许我应该问这是两个单独的问题...

回答这个问题由两个部分组成:

  1. 一种程序,以检测CUDA的GPU的存在。
  2. CMake代码在配置时编译,运行和解释该程序的结果。

对于第1部分,gpu嗅探程序,我从fabrizioM提供的答案开始,因为它非常紧凑。我很快发现,我需要很多未知的答案中找到的细节才能让它运作良好。

#include <stdio.h> 
#include <cuda_runtime.h> 

int main() { 
    int deviceCount, device; 
    int gpuDeviceCount = 0; 
    struct cudaDeviceProp properties; 
    cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount); 
    if (cudaResultCode != cudaSuccess) 
     deviceCount = 0; 
    /* machines with no GPUs can still report one emulation device */ 
    for (device = 0; device < deviceCount; ++device) { 
     cudaGetDeviceProperties(&properties, device); 
     if (properties.major != 9999) /* 9999 means emulation only */ 
      ++gpuDeviceCount; 
    } 
    printf("%d GPU CUDA device(s) found\n", gpuDeviceCount); 

    /* don't just return the number of gpus, because other runtime cuda 
     errors can also yield non-zero return values */ 
    if (gpuDeviceCount > 0) 
     return 0; /* success */ 
    else 
     return 1; /* failure */ 
} 

注意,返回的代码是在支持CUDA的GPU找到了零的情况下:我结束了与下面的C源文件,我命名为has_cuda_gpu.c是。这是因为在我的一台有 - 无GPU的机器上,该程序会产生一个带有非零退出代码的运行时错误。因此,任何非零退出代码都被解释为“cuda无法在此机器上工作”。

您可能会问为什么我不在非GPU机器上使用cuda仿真模式。这是因为仿真模式是越野车。我只想调试我的代码,并解决cuda GPU代码中的错误。我没有时间去调试模拟器。

问题的第二部分是使用此测试程序的cmake代码。经过一番斗争,我发现了。以下块是一个更大的CMakeLists.txt文件的一部分:

find_package(CUDA) 
if(CUDA_FOUND) 
    try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR 
     ${CMAKE_BINARY_DIR} 
     ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c 
     CMAKE_FLAGS 
      -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE} 
      -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY} 
     COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR 
     RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR) 
    message("${RUN_OUTPUT_VAR}") # Display number of GPUs found 
    # COMPILE_RESULT_VAR is TRUE when compile succeeds 
    # RUN_RESULT_VAR is zero when a GPU is found 
    if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR) 
     set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present") 
    else() 
     set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present") 
    endif() 
endif(CUDA_FOUND) 

这将设置一个CUDA_HAVE_GPU布尔变量在随后可以被用来触发条件操作cmake的。

我花了很长时间才发现包含和链接参数需要在CMAKE_FLAGS节中介绍,以及语法应该是什么。 try_run documentation非常轻,但try_compile documentation中有更多信息,这是一个密切相关的命令。在开始工作之前,我仍然需要在网上搜索try_compile和try_run的例子。

另一个棘手但很重要的细节是try_run,“bindir”的第三个参数。您应该始终将其设置为${CMAKE_BINARY_DIR}。特别是,如果您位于项目的子目录中,请不要将其设置为${CMAKE_CURRENT_BINARY_DIR}。 CMake希望在bindir中找到子目录CMakeFiles/CMakeTmp,并且如果该目录不存在则发出错误。只需使用${CMAKE_BINARY_DIR},这是这些子目录似乎自然存在的位置。

+0

可以避免使用CMake来运行与CUDA运行时一起安装的工具,如nvidia-smi,从而避免维护和编译单独的程序。看到我的答案。 – mabraham 2017-01-10 16:27:05

如果找到cuda,则可以编译小型GPU查询程序。这里是一个简单的,你可以采取的需求:

#include <stdlib.h> 
#include <stdio.h> 
#include <cuda.h> 
#include <cuda_runtime.h> 

int main(int argc, char** argv) { 
    int ct,dev; 
    cudaError_t code; 
    struct cudaDeviceProp prop; 

cudaGetDeviceCount(&ct); 
code = cudaGetLastError(); 
if(code) printf("%s\n", cudaGetErrorString(code)); 


if(ct == 0) { 
    printf("Cuda device not found.\n"); 
    exit(0); 
} 
printf("Found %i Cuda device(s).\n",ct); 

for (dev = 0; dev < ct; ++dev) { 
printf("Cuda device %i\n", dev); 

cudaGetDeviceProperties(&prop,dev); 
printf("\tname : %s\n", prop.name); 
printf("\ttotalGlobablMem: %lu\n", (unsigned long)prop.totalGlobalMem); 
printf("\tsharedMemPerBlock: %i\n", prop.sharedMemPerBlock); 
printf("\tregsPerBlock: %i\n", prop.regsPerBlock); 
printf("\twarpSize: %i\n", prop.warpSize); 
printf("\tmemPitch: %i\n", prop.memPitch); 
printf("\tmaxThreadsPerBlock: %i\n", prop.maxThreadsPerBlock); 
printf("\tmaxThreadsDim: %i, %i, %i\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]); 
printf("\tmaxGridSize: %i, %i, %i\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]); 
printf("\tclockRate: %i\n", prop.clockRate); 
printf("\ttotalConstMem: %i\n", prop.totalConstMem); 
printf("\tmajor: %i\n", prop.major); 
printf("\tminor: %i\n", prop.minor); 
printf("\ttextureAlignment: %i\n", prop.textureAlignment); 
printf("\tdeviceOverlap: %i\n", prop.deviceOverlap); 
printf("\tmultiProcessorCount: %i\n", prop.multiProcessorCount); 
} 
} 
+0

+1这对于嗅探GPU的部分来说是一个很好的开始。但如果没有cmake部分,我很犹豫是否接受这个答案。 – 2010-02-19 02:07:41

+0

@Christopher 没问题,可惜我不知道cmake(我用automake)。 http://www.gnu.org/software/hello/manual/autoconf/Runtime.html是autoconf的相关部分。也许它会帮助你找到相应的cmake功能 – Anycorn 2010-02-19 02:59:46

写一个简单的程序像

#include<cuda.h> 

int main(){ 
    int deviceCount; 
    cudaError_t e = cudaGetDeviceCount(&deviceCount); 
    return e == cudaSuccess ? deviceCount : -1; 
} 

,并检查返回值。

+0

+1这个答案和未知的答案一起给了我一个很好的开始解决这个问题。 – 2010-02-19 16:31:58

我刚刚写了一个纯Python脚本,它完成了您似乎需要的一些事情(我从pystream项目中获取了大部分内容)。它基本上只是CUDA运行时库(它使用ctypes)中的一些函数的包装。查看main()函数以查看示例用法。另外,请注意,我只是写了它,所以它可能包含错误。谨慎使用。

#!/bin/bash 

import sys 
import platform 
import ctypes 

""" 
cudart.py: used to access pars of the CUDA runtime library. 
Most of this code was lifted from the pystream project (it's BSD licensed): 
http://code.google.com/p/pystream 

Note that this is likely to only work with CUDA 2.3 
To extend to other versions, you may need to edit the DeviceProp Class 
""" 

cudaSuccess = 0 
errorDict = { 
    1: 'MissingConfigurationError', 
    2: 'MemoryAllocationError', 
    3: 'InitializationError', 
    4: 'LaunchFailureError', 
    5: 'PriorLaunchFailureError', 
    6: 'LaunchTimeoutError', 
    7: 'LaunchOutOfResourcesError', 
    8: 'InvalidDeviceFunctionError', 
    9: 'InvalidConfigurationError', 
    10: 'InvalidDeviceError', 
    11: 'InvalidValueError', 
    12: 'InvalidPitchValueError', 
    13: 'InvalidSymbolError', 
    14: 'MapBufferObjectFailedError', 
    15: 'UnmapBufferObjectFailedError', 
    16: 'InvalidHostPointerError', 
    17: 'InvalidDevicePointerError', 
    18: 'InvalidTextureError', 
    19: 'InvalidTextureBindingError', 
    20: 'InvalidChannelDescriptorError', 
    21: 'InvalidMemcpyDirectionError', 
    22: 'AddressOfConstantError', 
    23: 'TextureFetchFailedError', 
    24: 'TextureNotBoundError', 
    25: 'SynchronizationError', 
    26: 'InvalidFilterSettingError', 
    27: 'InvalidNormSettingError', 
    28: 'MixedDeviceExecutionError', 
    29: 'CudartUnloadingError', 
    30: 'UnknownError', 
    31: 'NotYetImplementedError', 
    32: 'MemoryValueTooLargeError', 
    33: 'InvalidResourceHandleError', 
    34: 'NotReadyError', 
    0x7f: 'StartupFailureError', 
    10000: 'ApiFailureBaseError'} 


try: 
    if platform.system() == "Microsoft": 
     _libcudart = ctypes.windll.LoadLibrary('cudart.dll') 
    elif platform.system()=="Darwin": 
     _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib') 
    else: 
     _libcudart = ctypes.cdll.LoadLibrary('libcudart.so') 
    _libcudart_error = None 
except OSError, e: 
    _libcudart_error = e 
    _libcudart = None 

def _checkCudaStatus(status): 
    if status != cudaSuccess: 
     eClassString = errorDict[status] 
     # Get the class by name from the top level of this module 
     eClass = globals()[eClassString] 
     raise eClass() 

def _checkDeviceNumber(device): 
    assert isinstance(device, int), "device number must be an int" 
    assert device >= 0, "device number must be greater than 0" 
    assert device < 2**8-1, "device number must be < 255" 


# cudaDeviceProp 
class DeviceProp(ctypes.Structure): 
    _fields_ = [ 
     ("name", 256*ctypes.c_char), # < ASCII string identifying device 
     ("totalGlobalMem", ctypes.c_size_t), # < Global memory available on device in bytes 
     ("sharedMemPerBlock", ctypes.c_size_t), # < Shared memory available per block in bytes 
     ("regsPerBlock", ctypes.c_int), # < 32-bit registers available per block 
     ("warpSize", ctypes.c_int), # < Warp size in threads 
     ("memPitch", ctypes.c_size_t), # < Maximum pitch in bytes allowed by memory copies 
     ("maxThreadsPerBlock", ctypes.c_int), # < Maximum number of threads per block 
     ("maxThreadsDim", 3*ctypes.c_int), # < Maximum size of each dimension of a block 
     ("maxGridSize", 3*ctypes.c_int), # < Maximum size of each dimension of a grid 
     ("clockRate", ctypes.c_int), # < Clock frequency in kilohertz 
     ("totalConstMem", ctypes.c_size_t), # < Constant memory available on device in bytes 
     ("major", ctypes.c_int), # < Major compute capability 
     ("minor", ctypes.c_int), # < Minor compute capability 
     ("textureAlignment", ctypes.c_size_t), # < Alignment requirement for textures 
     ("deviceOverlap", ctypes.c_int), # < Device can concurrently copy memory and execute a kernel 
     ("multiProcessorCount", ctypes.c_int), # < Number of multiprocessors on device 
     ("kernelExecTimeoutEnabled", ctypes.c_int), # < Specified whether there is a run time limit on kernels 
     ("integrated", ctypes.c_int), # < Device is integrated as opposed to discrete 
     ("canMapHostMemory", ctypes.c_int), # < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer 
     ("computeMode", ctypes.c_int), # < Compute mode (See ::cudaComputeMode) 
     ("__cudaReserved", 36*ctypes.c_int), 
] 

    def __str__(self): 
     return """NVidia GPU Specifications: 
    Name: %s 
    Total global mem: %i 
    Shared mem per block: %i 
    Registers per block: %i 
    Warp size: %i 
    Mem pitch: %i 
    Max threads per block: %i 
    Max treads dim: (%i, %i, %i) 
    Max grid size: (%i, %i, %i) 
    Total const mem: %i 
    Compute capability: %i.%i 
    Clock Rate (GHz): %f 
    Texture alignment: %i 
""" % (self.name, self.totalGlobalMem, self.sharedMemPerBlock, 
     self.regsPerBlock, self.warpSize, self.memPitch, 
     self.maxThreadsPerBlock, 
     self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2], 
     self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2], 
     self.totalConstMem, self.major, self.minor, 
     float(self.clockRate)/1.0e6, self.textureAlignment) 

def cudaGetDeviceCount(): 
    if _libcudart is None: return 0 
    deviceCount = ctypes.c_int() 
    status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount)) 
    _checkCudaStatus(status) 
    return deviceCount.value 

def getDeviceProperties(device): 
    if _libcudart is None: return None 
    _checkDeviceNumber(device) 
    props = DeviceProp() 
    status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device) 
    _checkCudaStatus(status) 
    return props 

def getDriverVersion(): 
    if _libcudart is None: return None 
    version = ctypes.c_int() 
    _libcudart.cudaDriverGetVersion(ctypes.byref(version)) 
    v = "%d.%d" % (version.value//1000, 
        version.value%100) 
    return v 

def getRuntimeVersion(): 
    if _libcudart is None: return None 
    version = ctypes.c_int() 
    _libcudart.cudaRuntimeGetVersion(ctypes.byref(version)) 
    v = "%d.%d" % (version.value//1000, 
        version.value%100) 
    return v 

def getGpuCount(): 
    count=0 
    for ii in range(cudaGetDeviceCount()): 
     props = getDeviceProperties(ii) 
     if props.major!=9999: count+=1 
    return count 

def getLoadError(): 
    return _libcudart_error 


version = getDriverVersion() 
if version is not None and not version.startswith('2.3'): 
    sys.stdout.write("WARNING: Driver version %s may not work with %s\n" % 
        (version, sys.argv[0])) 

version = getRuntimeVersion() 
if version is not None and not version.startswith('2.3'): 
    sys.stdout.write("WARNING: Runtime version %s may not work with %s\n" % 
        (version, sys.argv[0])) 


def main(): 

    sys.stdout.write("Driver version: %s\n" % getDriverVersion()) 
    sys.stdout.write("Runtime version: %s\n" % getRuntimeVersion()) 

    nn = cudaGetDeviceCount() 
    sys.stdout.write("Device count: %s\n" % nn) 

    for ii in range(nn): 
     props = getDeviceProperties(ii) 
     sys.stdout.write("\nDevice %d:\n" % ii) 
     #sys.stdout.write("%s" % props) 
     for f_name, f_type in props._fields_: 
      attr = props.__getattribute__(f_name) 
      sys.stdout.write(" %s: %s\n" % (f_name, attr)) 

    gpuCount = getGpuCount() 
    if gpuCount > 0: 
     sys.stdout.write("\n") 
    sys.stdout.write("GPU count: %d\n" % getGpuCount()) 
    e = getLoadError() 
    if e is not None: 
     sys.stdout.write("There was an error loading a library:\n%s\n\n" % e) 

if __name__=="__main__": 
    main() 
+0

这是使用python的一个有趣的想法。这样cmake部分可能会包含FIND_PACKAGE(PythonInterp)和EXECUTE_PROCESS(...),这看起来可能更简单。另一方面,我担心该python脚本很长,看起来可能取决于可能会改变的CUDA API的各个方面。 – 2010-02-21 17:15:54

+0

同意。 DeviceProp类可能需要更新每个新的CUDA运行时版本。 – 2010-02-22 02:12:21

+0

我得到一个错误:除了OSError,e:[SyntaxError:invalid syntax]在python 3.5中 – programmer 2017-06-09 13:24:49

一个有用的方法是运行CUDA安装的程序,例如nvidia-smi,以查看它们返回的内容。

 find_program(_nvidia_smi "nvidia-smi") 
     if (_nvidia_smi) 
      set(DETECT_GPU_COUNT_NVIDIA_SMI 0) 
      # execute nvidia-smi -L to get a short list of GPUs available 
      exec_program(${_nvidia_smi_path} ARGS -L 
       OUTPUT_VARIABLE _nvidia_smi_out 
       RETURN_VALUE _nvidia_smi_ret) 
      # process the stdout of nvidia-smi 
      if (_nvidia_smi_ret EQUAL 0) 
       # convert string with newlines to list of strings 
       string(REGEX REPLACE "\n" ";" _nvidia_smi_out "${_nvidia_smi_out}") 
       foreach(_line ${_nvidia_smi_out}) 
        if (_line MATCHES "^GPU [0-9]+:") 
         math(EXPR DETECT_GPU_COUNT_NVIDIA_SMI "${DETECT_GPU_COUNT_NVIDIA_SMI}+1") 
         # the UUID is not very useful for the user, remove it 
         string(REGEX REPLACE " \\(UUID:.*\\)" "" _gpu_info "${_line}") 
         if (NOT _gpu_info STREQUAL "") 
          list(APPEND DETECT_GPU_INFO "${_gpu_info}") 
         endif() 
        endif() 
       endforeach() 

       check_num_gpu_info(${DETECT_GPU_COUNT_NVIDIA_SMI} DETECT_GPU_INFO) 
       set(DETECT_GPU_COUNT ${DETECT_GPU_COUNT_NVIDIA_SMI}) 
      endif() 
     endif() 

也可以查询linux/proc或lspci。请参阅完整工作的CMake示例,其中https://github.com/gromacs/gromacs/blob/master/cmake/gmxDetectGpu.cmake