|
| View previous topic :: View next topic |
| Author |
Message |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Thu May 23, 2013 8:51 am Post subject: |
|
|
Hi Ben,
| Quote: | | Is it that transfering half of the array still takes about the same time as transferring the entire array? | If both threads transfer half the array at about the same time, then yes, it typically takes about the same amount of time as if one thread transferred the whole array. So your compute time should half, but the overall data transfer time will stay about the same.
If you can interleave data and compute, then you might be able to maximize the data bandwidth. Though this is tough to do in an OpenMP context given there's typically a tighter synchronization between threads. Eventually, you'll also be able to use the OpenACC async clauses which might help in interleaving, but unfortunately, we don't quite have async working well enough within OpenMP (hence the PGI_ACC_SYNCHRONOUS variable). Async works fine in a serial and MPI context though.
- Mat |
|
| Back to top |
|
 |
brush
Joined: 26 Jun 2012 Posts: 30
|
Posted: Wed May 29, 2013 4:01 pm Post subject: |
|
|
Could you clarify the usage of acc_set_device_num(devicenum,devicetype):
For the device number, are the GPUs numbered 0, 1, 2, ... or 1, 2, 3... I thought it was the former, but according to this link (http://www.catagle.com/26-23/pgi_accel_prog_model_1_2.htm) passing in 0 gives default behavior, not the first GPU. Is the CUDA Device Number, as displayed by pgaccelinfo, the number I need to input as my argument to get that device?
What does a devicetype of 0 or 1 do? (I didn't understand the documentation linked above).
Thanks,
Ben |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Fri May 31, 2013 11:02 am Post subject: |
|
|
Hi Ben,
The GPU are numbered 0-N.
For us, the default behavior is to use the lowest numbered device the binary will run. Typically this would be device zero, though could be something higher. The device information, including the numbering, can be found by running the "pgaccelinfo" utility.
For the the devicetype, you should use the enumerated names such as ACC_DEVICE_NVIDIA since the numbering may not be consistent between compilers. You can see the PGI list by viewing the header file "include/accel.h" (located in your PGI installation directory).
From 13.5's accel.h:
| Code: | typedef enum{
acc_device_none = 0,
acc_device_default = 1,
acc_device_host = 2,
acc_device_not_host = 3,
acc_device_nvidia = 4,
acc_device_radeon = 5,
acc_device_xeonphi = 6,
acc_device_pgi_opencl = 7,
acc_device_nvidia_opencl = 8,
acc_device_opencl = 9
}acc_device_t; |
Hope this helps,
Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|