PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

CUDA+MPI error on workstation

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
zsh



Joined: 11 Aug 2012
Posts: 9

PostPosted: Wed Dec 05, 2012 7:10 pm    Post subject: CUDA+MPI error on workstation Reply with quote

hi,
my compiler version is 64bits 11.7 PGI.Acc.Fortran, and when i do with cuda+mpi work on 64 bits workstations i encountered an problem.
it's an seismic migration code, and it do many shots cycles and within every shots thousands of timesteps has to calculate. its the background of the code.
at first i checked the program to calculate only 10 timesteps or hunderds timesteps, and it done. but when i give an real calculate timestep about 6000 value, which makes the calculation time is long and the error happened:
killed by single 2
p0_31083: p4_error: net_recv read: probable EOF on socket: 1
p0_31083: (33208.406250) net_send: could not write to fd=4, errno = 32
*=============
the command line is
pgfortran -Mcuda -Mmpi -o mpi mpi.f90
mpirun -np 3 mpi >a.dat&
*==============
the code is :

program RTM
use cudafor
include 'mpif.h'
here is parameters define****
call MPI_INIT(IEER)
call MPI_COMM_SIZE(MPI_COMM_WORLD, NUMPROCS, IEER)
call MPI_COMM_RANK(MPI_COMM_WORLD, MYID, IEER)
here read some files*****
ierr=cudaGetDeviceCount(numdev)
ierr=cudasetdevice(myid)
call subroutine cal(parameters)
call MPI_FINALIZE(IEER)
end

subroutine cal(parameters)
use cudafor
include 'mpif.h'
here is some parameters' calculation
do ishots=1+myid,nshots,numprocs (shots cycle)
do it=1,max_timesteps
call gpu subroutines
host array = device array
enddo
write the result to the disk
enddo
end subroutine
*===========================

thanks, if someone can help to solve the problem.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Thu Dec 06, 2012 2:07 pm    Post subject: Reply with quote

Hi zsh,

Is there another message above the signal 2? This just indicates that one of the MPI process encountered some problem and was terminated with an interrupt signal. You'll need to do more digging to figure out what the actual error is. That should help narrowing down how to determine the cause. From the information given it could be anything.

- Mat
Back to top
View user's profile
zsh



Joined: 11 Aug 2012
Posts: 9

PostPosted: Wed Dec 19, 2012 7:10 pm    Post subject: Reply with quote

hi,mat
these days i tried to found out what caused the problem from many ways, and checked out that was memroy. i expanded the memroy form 32GB to 64GB, my code can be done. but it does not resolve the problem basiclly. when i expanded the calculate scale or add the calclulate nodes, it still happen.
so i paied attention to memroy use during calclutation, and i found out the used memory becomes larger and larger though time elapse, but the code does not allocate or used so many memory amount.
i think may be it is memory leak, but i am confused because i deallocated all the allocated memory.
so, can you give me some idea about how to check which part of code casued memory leak?
thanks !
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Thu Dec 20, 2012 11:46 am    Post subject: Reply with quote

Does the problem still occur if you use 1 process? For memory issues like these, I typically use Valgrind (www.valgrind.org) but they only have limited support for multiprocess MPI.

- Mat
Back to top
View user's profile
zsh



Joined: 11 Aug 2012
Posts: 9

PostPosted: Thu Dec 20, 2012 7:43 pm    Post subject: Reply with quote

1 process is ok without any problem. the error happend only when i use multiprocess. so i believe this analysis tool would be helpful.
thanks for your apply!
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group