PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Seg-fault during profiling of MPi+OpenMP application

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
filippo.spiga



Joined: 02 May 2008
Posts: 5
Location: Milan

PostPosted: Sun Jan 18, 2009 4:01 am    Post subject: Seg-fault during profiling of MPi+OpenMP application Reply with quote

Hi all,
this is my first time on this forum. I have a problem with my application when I enable profiling and multithreading options. My application uses both MPI and OpenMP and it's written in FORTRAN90. I have followed all the instructions and expedients reported on the documentation but the segmentation fault continue to appear on each execution

I use LSF batch system to submit my application...

export OMP_NUM_THREADS=2
export OMP_STACK_SIZE=256M
mpirun -np 4 [...] ./myapp.x

The error is similar to the following
[node0009:11998] *** Process received signal ***
[node0009:11998] Signal: Segmentation fault (11)
[node0009:11998] Signal code: Address not mapped (1)
[node0009:11998] Failing at address: 0xffffffffffffff18
[node0009:11998] *** End of error message ***

$ ldd myapp.x
libmpi_f90.so.0 => /opt/openmpi/1.2.8/pgi--8.0-2--binary/lib/libmpi_f90.so.0 (0x00002aaaaacc6000)
libmpi_f77.so.0 => /opt/openmpi/1.2.8/pgi--8.0-2--binary/lib/libmpi_f77.so.0 (0x00002aaaaaf22000)
libmpi.so.0 => /opt/openmpi/1.2.8/pgi--8.0-2--binary/lib/libmpi.so.0 (0x00002aaaab152000)
libopen-rte.so.0 => /opt/openmpi/1.2.8/pgi--8.0-2--binary/lib/libopen-rte.so.0 (0x00002aaaab4a1000)
libopen-pal.so.0 => /copt/openmpi/1.2.8/pgi--8.0-2--binary/lib/libopen-pal.so.0 (0x00002aaaab76f000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002aaaab9eb000)
librt.so.1 => /lib64/librt.so.1 (0x00002aaaabbf7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaabe00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaac004000)
libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaac21d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaac420000)
libpgbind.so => /popt/pgi/linux86-64/8.0-2/lib/libpgbind.so (0x00002aaaac63a000)
libnuma.so => /opt/pgi/linux86-64/8.0-2/lib/libnuma.so (0x00002aaaac73c000)
libm.so.6 => /lib64/libm.so.6 (0x00002aaaac83d000)
libc.so.6 => /lib64/libc.so.6 (0x00002aaaacac0000)
libpgf90.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90.so (0x00002aaaace11000)
libpgf90_rpm1.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90_rpm1.so (0x00002aaaad1cc000)
libpgf902.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf902.so (0x00002aaaad2ce000)
libpgf90rtl.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90rtl.so (0x00002aaaad3e1000)
libpgftnrtl.so => /opt/pgi/linux86-64/8.0-2/libso/libpgftnrtl.so (0x00002aaaad504000)
libpgc.so => /opt/pgi/linux86-64/8.0-2/libso/libpgc.so (0x00002aaaad632000)
/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

Compile/linker flags are:
MPIF90 = mpif90
CC = pgcc
F77 = pgf77
CFLAGS = -mp -O1 -W0,-profile,lines -Mprof=lines
F90FLAGS = -mp -O1 -r8 -W0,-profile,lines -Mprof=lines
LD = mpif90 -mp -lpgnod_prof_openmpi -W0,-profile,lines -Mprof=lines

If I remove "-mp" flag, the application works without faults!
I'm using the latest available version of PGI compiler (8.0-2).

How I can resolve this problem?

Thank you very much in advance!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6141
Location: The Portland Group Inc.

PostPosted: Mon Jan 19, 2009 9:18 am    Post subject: Reply with quote

Hi filippo.spiga,

The most likely cause is a stack overflow. I see that you've tried increasing the stack size, but accidently misspelled the environment variable. The OpenMP 3.0 variable is "OMP_STACKSIZE". If removing the second underscore doesn't work, try increaseing the size to 512M.

If it still seg faults, then try running your code in the PGI debugger, PGDBG, to get a better understanding of the error. You'll need to use the MPI libraries that accompany your compilers. If you have the PGI CDK product you'll be able to debug your program using either MPI or MPI-2 on your cluster. Otherwise, you'll need use MPI-1 (mpich) and will be limited to debugging on a single node.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group