PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

accelerate a single loop with mpi and gpu
Goto page Previous  1, 2, 3, 4
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 4995
Location: The Portland Group Inc.

PostPosted: Thu May 23, 2013 8:51 am    Post subject: Reply with quote

Hi Ben,

Quote:
Is it that transfering half of the array still takes about the same time as transferring the entire array?
If both threads transfer half the array at about the same time, then yes, it typically takes about the same amount of time as if one thread transferred the whole array. So your compute time should half, but the overall data transfer time will stay about the same.

If you can interleave data and compute, then you might be able to maximize the data bandwidth. Though this is tough to do in an OpenMP context given there's typically a tighter synchronization between threads. Eventually, you'll also be able to use the OpenACC async clauses which might help in interleaving, but unfortunately, we don't quite have async working well enough within OpenMP (hence the PGI_ACC_SYNCHRONOUS variable). Async works fine in a serial and MPI context though.

- Mat
Back to top
View user's profile
brush



Joined: 26 Jun 2012
Posts: 30

PostPosted: Wed May 29, 2013 4:01 pm    Post subject: Reply with quote

Could you clarify the usage of acc_set_device_num(devicenum,devicetype):

For the device number, are the GPUs numbered 0, 1, 2, ... or 1, 2, 3... I thought it was the former, but according to this link (http://www.catagle.com/26-23/pgi_accel_prog_model_1_2.htm) passing in 0 gives default behavior, not the first GPU. Is the CUDA Device Number, as displayed by pgaccelinfo, the number I need to input as my argument to get that device?

What does a devicetype of 0 or 1 do? (I didn't understand the documentation linked above).

Thanks,
Ben
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 4995
Location: The Portland Group Inc.

PostPosted: Fri May 31, 2013 11:02 am    Post subject: Reply with quote

Hi Ben,

The GPU are numbered 0-N.

For us, the default behavior is to use the lowest numbered device the binary will run. Typically this would be device zero, though could be something higher. The device information, including the numbering, can be found by running the "pgaccelinfo" utility.

For the the devicetype, you should use the enumerated names such as ACC_DEVICE_NVIDIA since the numbering may not be consistent between compilers. You can see the PGI list by viewing the header file "include/accel.h" (located in your PGI installation directory).

From 13.5's accel.h:
Code:
typedef enum{
        acc_device_none = 0,
        acc_device_default = 1,
        acc_device_host = 2,
        acc_device_not_host = 3,
        acc_device_nvidia = 4,
        acc_device_radeon = 5,
        acc_device_xeonphi = 6,
        acc_device_pgi_opencl = 7,
        acc_device_nvidia_opencl = 8,
        acc_device_opencl = 9
    }acc_device_t;


Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4
Page 4 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2002 phpBB Group