程序员达达

CUDA

Kernel launch overhead on GPU

Reference: I follow all the experiments in the reference link. The only difference is the system configuration. CUDA runtime: 4.0 CUDA driver: 4.0 On “CUDA2″ Tesla C2050/C2070: Kernel Configuration: 512 block, 512 threads per block Iteration: 100000 Time(Kernel+Sync): 3.321298 Time(Kernel+ASync):0.765866 Time(Sync):1.647582 Tesla C2050/C2070: Kernel Configuration: 512 block, 512 threads per block Iteration: 100000 Time(Kernel+Sync): 3.321298…

cudaGetDeviceCount returned 38

You will feel frustrated when you try to run “deviceQuery” while error happens: cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected It’s probably because you don’t use any of the Nvidia’s cards as dislay so that there is no /dev/nvidia* or the permission is not right. Run “deviceQuery” will fix this. Then check “/etc/udev/rules.d/50-udev.rules”…