通过引用传递CUDA随机生成器状态
问题描述:
当通过函数CalculateValue(curandState * localStat)和GetExponential(curandState * localState)中的引用传递随机生成器状态(CUDA toolkit 3.2 curand.lib)时,以下代码是否正确?通过引用传递CUDA随机生成器状态
感谢
__device__ double GetExponential(curandState *localState) {
double u1 = curand_uniform_double(localState); }
__device__ double CalculateValue(curandState *localStat) {
double x = GetExponential(localState);
return x; }
__global__ void RunMonteCarloKernel(curandState *state, double *results) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
/* Copy state to local memory for efficiency */
curandState localState = state[threadIdx.x + blockIdx.x * blockDim.x];
results[i] = CalculateValue(&localState);
/* Copy state back to global memory */
state[threadIdx.x + blockIdx.x * blockDim.x] = localState; }
__global__ void setup_kernel(curandState *state) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
/* Each thread gets different seed, a different sequence number, no offset */
curand_init(i, i, 0, &state[i]); }
int main(void) {
double *devResults;
curandState *devStates;
/* Allocate space for prng states on device */
CUDA_CALL(cudaMalloc((void **)&devStates, totalThreads * sizeof(curandState)));
/* Setup prng states */
setup_kernel<<<totalBlocks, threadsPerBlock>>>(devStates);
for(int i=0; i< 1000; i++)
{
RunMonteCarloKernel(devStates, devResults);
} }
答
是否有问题?它看起来不错。
您可能想要查看位于3.2 SDK的MonteCarloCURAND目录中的EstimatePiInlineP示例。它使用C++风格传递来避免获取局部变量的地址。你需要在内核末尾将状态存回内存(就像你在代码中一样)。
通过C++引用传递可以帮助编译器清楚地显示该函数可以直接在原始寄存器中对数据进行操作。如果编译器无法确定所有线程都以相同的方式处理指针(即指针上的操作相同),那么在GPU中使用本地数组的地址会对性能造成不利影响,在这种情况下,它会将数组溢出到本地内存。它会工作,但可能会慢一些。
尝试更具体的问题,它有助于确切地说明你的问题。在回答你的问题时,我怀疑这个代码甚至编译,因为GetExponential中没有return语句。 – asm 2010-11-09 18:07:02