PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

MPI_WAIT error

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
eee



Joined: 23 Jan 2007
Posts: 3

PostPosted: Mon Jul 09, 2007 7:13 am    Post subject: MPI_WAIT error Reply with quote

I am trying to run in parallel a meteorological model (RAMSV.6) which was compiled with PGI in Dell Cluster (9 processors) into a LInux environment(SUSE Linux Enterprise Server 10).
I have run it succesfully in other occasions. Now, I am executing a meteorological simulation with a larger dataset and...it runs OK for a while, but then I get the following error messages related to MPI:

radiation tendencies updated time = 10800.0 UTC TIME (HRS) = 3.0
rank 8 in job 13 n0_55134 caused collective abort of all ranks
exit status of rank 8: killed by signal 11
[cli_4]: aborting job:
Fatal error in MPI_Wait: Other MPI error, error stack:
MPI_Wait(140).............................: MPI_Wait(request=0x7fffe3a1132c, status0x7fffe3a11330) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=5,errno=104:Connection reset by peer)

[/i]

Any idea?
Thanks in advance,
Estibaliz
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jul 09, 2007 2:33 pm    Post subject: Reply with quote

Hi eee,

It sounds like a stack overflow, but you'll need to run the program with the debugger to be sure. Try setting the stack size to unlimited in your shell's start-up file to see if this works around the problem.

- Mat
Back to top
View user's profile
eee



Joined: 23 Jan 2007
Posts: 3

PostPosted: Tue Jul 10, 2007 12:24 am    Post subject: Reply with quote

I donīt know how can I run the program with the debigger, Iīm not very experienced.
The program was compiled with the following options:

MACH=PC_LINUX1
F_COMP=pgf90
F_OPTS=-Mvect=cachesize:524288 -Munroll -Mnoframe -O2 -pc 64
C_COMP=pgcc
C_OPTS= -O3 -DUNDERSCORE -DLITTLE
LOADER=pgf90
LOADER_OPTS=-v -lgcc_eh -lpthread
LIBS=-L/opt/pgi/linux86-64/6.2/lib -L/opt/pgi/linux86-64/6.2/libso
ARCHIVE=ar rs


I can post additional information if that will help track down the problem.
Any suggestion is greatly appreciated........
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Jul 10, 2007 11:31 am    Post subject: Reply with quote

Hi Estibaliz,

First try increasing the available stack size to see if it corrects the problem. To do this add "ulimit -s unlimited" to your home directory's ".bashrc" file if you're using the bash shell, or "limit stacksize unlimited" in your ".cshrc" file if you're using TCSH/CSH.

As for using the PGI debugger, please refer to the PGI Tool's Guide for detailed information. Note that you must have the PGI CDK product to use the MPI debugging feature.

- Mat[/url]
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group