PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

mpi + pgi directives question
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Thu Feb 21, 2013 8:24 am    Post subject: mpi + pgi directives question Reply with quote

Hi,

Two questions: 1. If I were to have part of a MPI code using CUDA, and other parts using PGI directives, is this gonna cause problems when I try to assign GPUs to an MPI process? For example, in the the 5x in 5hours article (http://www.pgroup.com/lit/articles/insider/v4n1a3_pgi_accelerator.htm) in the "set up code" section, would this assigni GPUs to each process just fine for both the CUDA and directives portion of the code?

2. With reguards to the set up code mentioned above, when I add that code, add a call to setDevice like such:
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, npp, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, me, ierr)
nproc = npp
IDPROC = me

devnum = setDevice(nproc,IDPROC)

Then when I add a region
!$acc region
!$acc do private(rhoy,rhox)
loop
!$acc end region

I get the runtime error
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 5000
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 5000
--------------------------------------------------------------------------
mpirun has exited due to process rank 5 with PID 4764 on
node dirac47-ib exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


which I suspect is due to a error in the "set up" code I inserted. Is there any common problems that I may be having here?

Thanks,
Ben
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Thu Feb 21, 2013 11:52 am    Post subject: Reply with quote

Hi Ben,

Quote:
1. If I were to have part of a MPI code using CUDA, and other parts using PGI directives, is this gonna cause problems when I try to assign GPUs to an MPI process? For example, in the the 5x in 5hours article (http://www.pgroup.com/lit/articles/insider/v4n1a3_pgi_accelerator.htm) in the "set up code" section, would this assigni GPUs to each process just fine for both the CUDA and directives portion of the code?
This should work, but with all the changes to our 2013 run time and the new CUDA versions, this is having some issue. The problem being that after you cudaSetDevice in the CUDA C portion of the code, the device isn't getting initialized. Our engineer ask me to have you try adding any CUDA call (like cudaMalloc) after the call to cudaSetDevice, to get the CUDA run time to initialize the device.

Quote:
which I suspect is due to a error in the "set up" code I inserted. Is there any common problems that I may be having here?
Possible, but it could be due to other reasons as well. Try the above work around and see if it fixes the problem.

- Mat
Back to top
View user's profile
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Thu Feb 21, 2013 12:56 pm    Post subject: Reply with quote

Thanks Mat. I actually haven't implemented CUDA and the directives together, but I was considering doing so and was just wondering what complications might occur in the process.

What I'm currently playing with is just a fortran MPI code and I'm only trying to add directives at the moment. Does the location that setDevice is called at matter, as long as its before the first accelerator region (and not like sitting in a loop or something)? Right now I just have it sitting in the subroutine with mpi_init. The accelerator regions are in a different subroutine, but I figure this doesn't matter.

Ben


Last edited by brush on Thu Feb 21, 2013 3:22 pm; edited 1 time in total
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Thu Feb 21, 2013 3:19 pm    Post subject: Reply with quote

Hi Ben,

Quote:
Does the location that setDevice is called at matter, as long as its before the first accelerator region (and not like sitting in a loop or something)?
It should be fine there (unless you're using an old compiler like pre-10.6).

Quote:
call to cuMemcpyDtoH returned error 700: Launch failed
This typically means the that kernel before the memcpy failed for some reason. Does the code run correctly without the directives enabled? (Be sure to guard the setDevice call with _OPENACC or _ACCEL macro)

- Mat
Back to top
View user's profile
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Thu Feb 21, 2013 4:29 pm    Post subject: Reply with quote

It runs correctly without directives enabled.

When I run the code with the directives, but I remove the private directive, I don't get the cuMemcpyDtoH error but the code instead hangs/gets stuck at the same place I would've gotten that error.

I am compiling with 12.3, but when I try to compile with 12.9 I get:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected flow graph (gem.f90: 503)

The code runs correctly after compiling with 12.9 with directives, but I don't know if its actually being accelerated much because of the above message. The rest of accel info from the compiler is:

ppush:
504, Accelerator scalar kernel generated
505, Loop is parallelizable
583, Loop is parallelizable
654, Sum reduction generated for mynopi


Ben
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group