| View previous topic :: View next topic |
| Author |
Message |
eee
Joined: 23 Jan 2007 Posts: 3
|
Posted: Mon Jul 09, 2007 7:13 am Post subject: MPI_WAIT error |
|
|
I am trying to run in parallel a meteorological model (RAMSV.6) which was compiled with PGI in Dell Cluster (9 processors) into a LInux environment(SUSE Linux Enterprise Server 10).
I have run it succesfully in other occasions. Now, I am executing a meteorological simulation with a larger dataset and...it runs OK for a while, but then I get the following error messages related to MPI:
radiation tendencies updated time = 10800.0 UTC TIME (HRS) = 3.0
rank 8 in job 13 n0_55134 caused collective abort of all ranks
exit status of rank 8: killed by signal 11
[cli_4]: aborting job:
Fatal error in MPI_Wait: Other MPI error, error stack:
MPI_Wait(140).............................: MPI_Wait(request=0x7fffe3a1132c, status0x7fffe3a11330) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=5,errno=104:Connection reset by peer)
[/i]
Any idea?
Thanks in advance,
Estibaliz |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Mon Jul 09, 2007 2:33 pm Post subject: |
|
|
Hi eee,
It sounds like a stack overflow, but you'll need to run the program with the debugger to be sure. Try setting the stack size to unlimited in your shell's start-up file to see if this works around the problem.
- Mat |
|
| Back to top |
|
 |
eee
Joined: 23 Jan 2007 Posts: 3
|
Posted: Tue Jul 10, 2007 12:24 am Post subject: |
|
|
I donīt know how can I run the program with the debigger, Iīm not very experienced.
The program was compiled with the following options:
MACH=PC_LINUX1
F_COMP=pgf90
F_OPTS=-Mvect=cachesize:524288 -Munroll -Mnoframe -O2 -pc 64
C_COMP=pgcc
C_OPTS= -O3 -DUNDERSCORE -DLITTLE
LOADER=pgf90
LOADER_OPTS=-v -lgcc_eh -lpthread
LIBS=-L/opt/pgi/linux86-64/6.2/lib -L/opt/pgi/linux86-64/6.2/libso
ARCHIVE=ar rs
I can post additional information if that will help track down the problem.
Any suggestion is greatly appreciated........ |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Tue Jul 10, 2007 11:31 am Post subject: |
|
|
Hi Estibaliz,
First try increasing the available stack size to see if it corrects the problem. To do this add "ulimit -s unlimited" to your home directory's ".bashrc" file if you're using the bash shell, or "limit stacksize unlimited" in your ".cshrc" file if you're using TCSH/CSH.
As for using the PGI debugger, please refer to the PGI Tool's Guide for detailed information. Note that you must have the PGI CDK product to use the MPI debugging feature.
- Mat[/url] |
|
| Back to top |
|
 |
|