PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

MPI error

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
zsh



Joined: 11 Aug 2012
Posts: 9

PostPosted: Tue Sep 25, 2012 8:58 pm    Post subject: MPI error Reply with quote

i use 11.7 PGI.Acc.Fortran, and when i use mpi+cuda fortran i encounted a problem, the slurm file which contains execuation information said: /home/bin/pgi/linux86-64/2011/mpi/mpich/bin/mpirun.ch_p4:line 243 3053 killed.

and i found out this problem was not always appear. some jobs can finished sucessfully, and some jobs can not.

plus, i used one cpu and one gpu to execute mpi work, for test.

thans very much!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Sep 26, 2012 8:56 am    Post subject: Reply with quote

Hi zsh,

This means that one of your MPI processes was killed or crashed unexpectedly. This could be caused by resource limits on your cluster, MPI configuration, program errors, etc. Basically, it could be any number of problems.

I would first start by running a single process (which you've done) and then run 2, 4, etc. untill the crash occurs. Try and limit your program to run on a single node and then run again on multiple nodes. Since your using CUDA Fortran, the problem may be with a particular GPU or oversubscribing GPUs (until the K20 is out, each MPI Process should have it's own GPU). If you think it may be a program error, you can compile in emulation mode (-g -Mcuda=emu) and then run your program in the PGI debugger, pgdbg. PGDBG is able to run MPI process. If you have a CDK license, then you can even run pgdbg accross multiple nodes.

Hope this helps,
Mat
Back to top
View user's profile
zsh



Joined: 11 Aug 2012
Posts: 9

PostPosted: Wed Sep 26, 2012 7:14 pm    Post subject: Reply with quote

thanks matt, i will figure out which part caused this error!

really thanks for your suggestion!
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group