I've been walking with this for some time, but it can not be found properly. I'm trying to copy objects to the CUDA device memory (and back again, but when I'm coming to do this, I will pass that bridge):
struct MyData {float * data; Int data lane; } Create a dummy object for copying zero copyToGPU () {// Int n = 10; MyData * h_items = New MyData [N]; For (int i = 0; i & lt; n; i ++) {h_its [i] .data lane = 100; H_items [i] .data = new float [100]; } // Copy Objects to GPU MyData * d_items; Int memSize = N * sizeof (MyData); CudaMalloc ((Zero **) & amp; d_items, memSize); CudaMemCpy (d_items, h_items, memSize, cudaMemcpyHostToDevice); // Run Kernel MyFunc & lt; & Lt; & Lt; 100,100 & gt; & Gt; & Gt; (D_ITAMS); } __global__ static zero MyFunc (MyData * data) {int idx = blockIdx.x * blockDim.x + threadIdx.x; To do something with (int i = 0; i & lt; data [idx] .dataLen; i ++) {// data [idx]. Data [i]}} When I call myfangle (D_items), I can use the data [idx]. DataLen is just fine. However, the data [idx]. The data has not been copied yet.
I can not use d_items.data in copyToGPU as the destination for cUdMalloc / cudaMemCpy functions because the host code can not be assigned to the device pointer.
What to do?
- Allocate device data to all structures, such as a single array.
- Copy the adjusted data with the GPU indicator
- Adjust the GPU indicator
Example:
float * D_data; CudaMalloc ((zero **) and D_data, n * 100 * size (float)); For (...) {h_items [i] .data = i * 100 + d_data; }
Comments
Post a Comment