| View previous topic :: View next topic |
| Author |
Message |
tmishima
Joined: 24 Nov 2008 Posts: 9
|
Posted: Thu Jan 14, 2010 4:39 pm Post subject: HPL Segfault Problem |
|
|
Hi,
I have a segfault problem with HPL-2.0 built by PGI 8.0-6, openmpi 1.4 and acml4.3.0.
Smaller size of matrix is OK, but HPL causes segfault for lager ones using more than 2GB/process.
Is this the limitation of HPL-2.0?
Anyway, please help me to solve this problem. I can send my Make.arch and HPL.dat to your specified mail address later.
Thank you, in advance.
Regards,
tmishima
P.S. for your information
The error is:
[node09:21940] *** Process received signal ***
[node09:21940] Signal: Segmentation fault (11)
[node09:21940] Signal code: Address not mapped (1)
[node09:21940] Failing at address: 0x2aaa35217088
[node09:21940] *** End of error message ***
ldd output is:
libmpi_f90.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi_f90.so.0 (0x00002b0dd4944000)
libmpi_f77.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi_f77.so.0 (0x00002b0dd4b47000)
libmpi.so.0 => /home/mishima/app/openmpi-pgi/lib/libmpi.so.0 (0x00002b0dd4d77000)
libopen-rte.so.0 => /home/mishima/app/openmpi-pgi/lib/libopen-rte.so.0 (0x00002b0dd5031000)
libopen-pal.so.0 => /home/mishima/app/openmpi-pgi/lib/libopen-pal.so.0 (0x00002b0dd527c000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003eeb400000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003ef3800000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003ef8a00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003eeb800000)
libpgmp.so => /opt/pgi/linux86-64/8.0-6/libso/libpgmp.so (0x00002b0dd551a000)
libpgbind.so => /opt/pgi/linux86-64/8.0-6/libso/libpgbind.so (0x00002b0dd5644000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003eea000000)
libpgf90.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90.so (0x00002b0dd5746000)
libpgf90_rpm1.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90_rpm1.so (0x00002b0dd5b02000)
libpgf902.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf902.so (0x00002b0dd5c04000)
libpgf90rtl.so => /opt/pgi/linux86-64/8.0-6/libso/libpgf90rtl.so (0x00002b0dd5d17000)
libpgftnrtl.so => /opt/pgi/linux86-64/8.0-6/libso/libpgftnrtl.so (0x00002b0dd5e3a000)
libpgc.so => /opt/pgi/linux86-64/8.0-6/libso/libpgc.so (0x00002b0dd5f68000)
librt.so.1 => /lib64/librt.so.1 (0x0000003eef400000)
libm.so.6 => /lib64/libm.so.6 (0x0000003eeb000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003eeac00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003ee9c00000) |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Jan 15, 2010 11:17 am Post subject: |
|
|
Hi tmishima,
Sorry, I don't know enough about HPL to know why this would occur.
You can try adding the "-Mlarge_arrays" flag in case it's a loop indexing size issue. Otherwise, please contact the authors of HPL for help.
- Mat |
|
| Back to top |
|
 |
tmishima
Joined: 24 Nov 2008 Posts: 9
|
Posted: Mon Jan 18, 2010 1:21 am Post subject: |
|
|
Hi Mat,
Thank your for your advice.
"-Mlarge_arrays" flag doesn't work well. I'm going to ask the authors of HPL for further help.
But, please let me confirm one thing. Is this due to my own environement or a kind of issue with the combination of PGI compiler and HPL-2.0?
tmishima |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Jan 19, 2010 10:06 am Post subject: |
|
|
Hi tmishima,
While I don't know what the cause is, my best guess is that HPL is using the MPI-1 "GetAddress" function (32-bit pointers) versus the MPI-2 "GetAddress64" function (64-pointers) or you are encountering some other type of 32-bit integer overflow error.
- Mat |
|
| Back to top |
|
 |
tmishima
Joined: 24 Nov 2008 Posts: 9
|
Posted: Tue Jan 19, 2010 5:09 pm Post subject: |
|
|
Hi, Mat
Thank you for your suggestion.
After all, I gave up PGI C comiler and change Make.arch to use gcc.
Then, everything goes fine, no segfault even with lager matrixes.
What's the differnece between gcc and pgcc?
Main part of modified Make.arch is as follows:
ARCH = Linux_ompi
#
MPdir = /home/mishima/app/openmpi-pgi
MPinc = -I$(MPdir)/include
MPlib =
#
# shared library does not work well....
LAdir = /data/app/acml4.3.0/pgi64_mp/lib
LAinc =
LAlib = $(LAdir)/libacml_mp.a
#
HPL_OPTS = -m64
#
CC = gcc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -O3 -fomit-frame-pointer
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = mpif90
LINKFLAGS = -Mnomain -mp
Regards,
tmishima |
|
| Back to top |
|
 |
|