|
| View previous topic :: View next topic |
| Author |
Message |
brush
Joined: 26 Jun 2012 Posts: 30
|
Posted: Thu Feb 21, 2013 8:24 am Post subject: mpi + pgi directives question |
|
|
Hi,
Two questions: 1. If I were to have part of a MPI code using CUDA, and other parts using PGI directives, is this gonna cause problems when I try to assign GPUs to an MPI process? For example, in the the 5x in 5hours article (http://www.pgroup.com/lit/articles/insider/v4n1a3_pgi_accelerator.htm) in the "set up code" section, would this assigni GPUs to each process just fine for both the CUDA and directives portion of the code?
2. With reguards to the set up code mentioned above, when I add that code, add a call to setDevice like such:
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, npp, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, me, ierr)
nproc = npp
IDPROC = me
devnum = setDevice(nproc,IDPROC)
Then when I add a region
!$acc region
!$acc do private(rhoy,rhox)
loop
!$acc end region
I get the runtime error
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 5000
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 5000
--------------------------------------------------------------------------
mpirun has exited due to process rank 5 with PID 4764 on
node dirac47-ib exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
which I suspect is due to a error in the "set up" code I inserted. Is there any common problems that I may be having here?
Thanks,
Ben |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5000 Location: The Portland Group Inc.
|
Posted: Thu Feb 21, 2013 11:52 am Post subject: |
|
|
Hi Ben,
| Quote: | | 1. If I were to have part of a MPI code using CUDA, and other parts using PGI directives, is this gonna cause problems when I try to assign GPUs to an MPI process? For example, in the the 5x in 5hours article (http://www.pgroup.com/lit/articles/insider/v4n1a3_pgi_accelerator.htm) in the "set up code" section, would this assigni GPUs to each process just fine for both the CUDA and directives portion of the code? | This should work, but with all the changes to our 2013 run time and the new CUDA versions, this is having some issue. The problem being that after you cudaSetDevice in the CUDA C portion of the code, the device isn't getting initialized. Our engineer ask me to have you try adding any CUDA call (like cudaMalloc) after the call to cudaSetDevice, to get the CUDA run time to initialize the device.
| Quote: | | which I suspect is due to a error in the "set up" code I inserted. Is there any common problems that I may be having here? | Possible, but it could be due to other reasons as well. Try the above work around and see if it fixes the problem.
- Mat |
|
| Back to top |
|
 |
brush
Joined: 26 Jun 2012 Posts: 30
|
Posted: Thu Feb 21, 2013 12:56 pm Post subject: |
|
|
Thanks Mat. I actually haven't implemented CUDA and the directives together, but I was considering doing so and was just wondering what complications might occur in the process.
What I'm currently playing with is just a fortran MPI code and I'm only trying to add directives at the moment. Does the location that setDevice is called at matter, as long as its before the first accelerator region (and not like sitting in a loop or something)? Right now I just have it sitting in the subroutine with mpi_init. The accelerator regions are in a different subroutine, but I figure this doesn't matter.
Ben
Last edited by brush on Thu Feb 21, 2013 3:22 pm; edited 1 time in total |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5000 Location: The Portland Group Inc.
|
Posted: Thu Feb 21, 2013 3:19 pm Post subject: |
|
|
Hi Ben,
| Quote: | | Does the location that setDevice is called at matter, as long as its before the first accelerator region (and not like sitting in a loop or something)? | It should be fine there (unless you're using an old compiler like pre-10.6).
| Quote: | | call to cuMemcpyDtoH returned error 700: Launch failed | This typically means the that kernel before the memcpy failed for some reason. Does the code run correctly without the directives enabled? (Be sure to guard the setDevice call with _OPENACC or _ACCEL macro)
- Mat |
|
| Back to top |
|
 |
brush
Joined: 26 Jun 2012 Posts: 30
|
Posted: Thu Feb 21, 2013 4:29 pm Post subject: |
|
|
It runs correctly without directives enabled.
When I run the code with the directives, but I remove the private directive, I don't get the cuMemcpyDtoH error but the code instead hangs/gets stuck at the same place I would've gotten that error.
I am compiling with 12.3, but when I try to compile with 12.9 I get:
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected flow graph (gem.f90: 503)
The code runs correctly after compiling with 12.9 with directives, but I don't know if its actually being accelerated much because of the above message. The rest of accel info from the compiler is:
ppush:
504, Accelerator scalar kernel generated
505, Loop is parallelizable
583, Loop is parallelizable
654, Sum reduction generated for mynopi
Ben |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|