CUDA 编程模型CPU VS GPU Kernels // Kernel 定义 __global__ void VecAdd(float* A, float* B, float* C) { int i = threadIdx.x; C[i] = A[i] + B[i]; } int main() { // Kernel 调用 VecAdd<<<1, N>>>(A, B, C); } 被子大约 2 分钟cuda