PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

HPL Segfault Problem
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Performance and Benchmarking
View previous topic :: View next topic  
Author Message
tmishima



Joined: 24 Nov 2008
Posts: 9

PostPosted: Thu Jan 14, 2010 4:39 pm    Post subject: HPL Segfault Problem Reply with quote

Hi,

I have a segfault problem with HPL-2.0 built by PGI 8.0-6, openmpi 1.4 and acml4.3.0.
Smaller size of matrix is OK, but HPL causes segfault for lager ones using more than 2GB/process.
Is this the limitation of HPL-2.0?

Anyway, please help me to solve this problem. I can send my Make.arch and HPL.dat to your specified mail address later.

Thank you, in advance.

Regards,
tmishima

P.S. for your information

The error is:
[node09:21940] *** Process received signal ***
[node09:21940] Signal: Segmentation fault (11)
[node09:21940] Signal code: Address not mapped (1)
[node09:21940] Failing at address: 0x2aaa35217088
[node09:21940] *** End of error message ***

ldd output is:
libmpi_f90.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi_f90.so.0 (0x00002b0dd4944000)
libmpi_f77.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi_f77.so.0 (0x00002b0dd4b47000)
libmpi.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi.so.0 (0x00002b0dd4d77000)
libopen-rte.so.0 => /home/mishima/app/openmpi-pgi/lib/libopen-rte.so.0 (0x00002b0dd5031000)
libopen-pal.so.0 => /home/mishima/app/openmpi-pgi/lib/libopen-pal.so.0 (0x00002b0dd527c000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003eeb400000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003ef3800000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003ef8a00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003eeb800000)
libpgmp.so => /opt/pgi/linux86-64/8.0-6/libso/libpgmp.so (0x00002b0dd551a000)
libpgbind.so => /opt/pgi/linux86-64/8.0-6/libso/libpgbind.so (0x00002b0dd5644000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003eea000000)
libpgf90.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90.so (0x00002b0dd5746000)
libpgf90_rpm1.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90_rpm1.so (0x00002b0dd5b02000)
libpgf902.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf902.so (0x00002b0dd5c04000)
libpgf90rtl.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90rtl.so (0x00002b0dd5d17000)
libpgftnrtl.so => /opt/pgi/linux86-64/8.0-6/libso/libpgftnrtl.so (0x00002b0dd5e3a000)
libpgc.so => /opt/pgi/linux86-64/8.0-6/libso/libpgc.so (0x00002b0dd5f68000)
librt.so.1 => /lib64/librt.so.1 (0x0000003eef400000)
libm.so.6 => /lib64/libm.so.6 (0x0000003eeb000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003eeac00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003ee9c00000)
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Jan 15, 2010 11:17 am    Post subject: Reply with quote

Hi tmishima,

Sorry, I don't know enough about HPL to know why this would occur.

You can try adding the "-Mlarge_arrays" flag in case it's a loop indexing size issue. Otherwise, please contact the authors of HPL for help.

- Mat
Back to top
View user's profile
tmishima



Joined: 24 Nov 2008
Posts: 9

PostPosted: Mon Jan 18, 2010 1:21 am    Post subject: Reply with quote

Hi Mat,

Thank your for your advice.

"-Mlarge_arrays" flag doesn't work well. I'm going to ask the authors of HPL for further help.

But, please let me confirm one thing. Is this due to my own environement or a kind of issue with the combination of PGI compiler and HPL-2.0?

tmishima
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Tue Jan 19, 2010 10:06 am    Post subject: Reply with quote

Hi tmishima,

While I don't know what the cause is, my best guess is that HPL is using the MPI-1 "GetAddress" function (32-bit pointers) versus the MPI-2 "GetAddress64" function (64-pointers) or you are encountering some other type of 32-bit integer overflow error.

- Mat
Back to top
View user's profile
tmishima



Joined: 24 Nov 2008
Posts: 9

PostPosted: Tue Jan 19, 2010 5:09 pm    Post subject: Reply with quote

Hi, Mat

Thank you for your suggestion.

After all, I gave up PGI C comiler and change Make.arch to use gcc.
Then, everything goes fine, no segfault even with lager matrixes.
What's the differnece between gcc and pgcc?

Main part of modified Make.arch is as follows:

ARCH = Linux_ompi
#
MPdir = /home/mishima/app/openmpi-pgi
MPinc = -I$(MPdir)/include
MPlib =
#
# shared library does not work well....
LAdir = /data/app/acml4.3.0/pgi64_mp/lib
LAinc =
LAlib = $(LAdir)/libacml_mp.a
#
HPL_OPTS = -m64
#
CC = gcc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -O3 -fomit-frame-pointer
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = mpif90
LINKFLAGS = -Mnomain -mp

Regards,
tmishima
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Performance and Benchmarking All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group