PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Building MVAPICH2 with PGI 2010
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Licenses and Installation
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Wed Jul 28, 2010 6:05 am    Post subject: Reply with quote

I just remembered something. To even get mpirun_rsh to be *this* successful, I had to set MALLOC_CHECK_=0. If I don't:
Code:
> env | grep MALL
> ~/mvapich2/bin/mpirun_rsh -np 2 -hostfile host_file_name ./hellow
*** glibc detected *** ./hellow: double free or corruption (fasttop): 0x0000000006f2a090 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3984a7230f]
/lib64/libc.so.6(cfree+0x4b)[0x3984a7276b]
./hellow[0x429ddd]
======= Memory map: ========
00400000-0045d000 r-xp 00000000 08:02 27166098                           /home/username/MPIExamples/hellow
0065c000-0069e000 rwxp 0005c000 08:02 27166098                           /home/username/MPIExamples/hellow
0069e000-006aa000 rwxp 0069e000 00:00 0
06f27000-06f48000 rwxp 06f27000 00:00 0                                  [heap]
3984600000-398461c000 r-xp 00000000 08:01 1184266                        /lib64/ld-2.5.so
398481b000-398481c000 r-xp 0001b000 08:01 1184266                        /lib64/ld-2.5.so
398481c000-398481d000 rwxp 0001c000 08:01 1184266                        /lib64/ld-2.5.so
3984a00000-3984b4e000 r-xp 00000000 08:01 1184267                        /lib64/libc-2.5.so
3984b4e000-3984d4d000 ---p 0014e000 08:01 1184267                        /lib64/libc-2.5.so
3984d4d000-3984d51000 r-xp 0014d000 08:01 1184267                        /lib64/libc-2.5.so
3984d51000-3984d52000 rwxp 00151000 08:01 1184267                        /lib64/libc-2.5.so
3984d52000-3984d57000 rwxp 3984d52000 00:00 0
3984e00000-3984e82000 r-xp 00000000 08:01 1184296                        /lib64/libm-2.5.so
3984e82000-3985081000 ---p 00082000 08:01 1184296                        /lib64/libm-2.5.so
3985081000-3985082000 r-xp 00081000 08:01 1184296                        /lib64/libm-2.5.so
3985082000-3985083000 rwxp 00082000 08:01 1184296                        /lib64/libm-2.5.so
3985600000-3985616000 r-xp 00000000 08:01 1184275                        /lib64/libpthread-2.5.so
3985616000-3985815000 ---p 00016000 08:01 1184275                        /lib64/libpthread-2.5.so
3985815000-3985816000 r-xp 00015000 08:01 1184275                        /lib64/libpthread-2.5.so
3985816000-3985817000 rwxp 00016000 08:01 1184275                        /lib64/libpthread-2.5.so
3985817000-398581b000 rwxp 3985817000 00:00 0
3985e00000-3985e07000 r-xp 00000000 08:01 1184276                        /lib64/librt-2.5.so
3985e07000-3986007000 ---p 00007000 08:01 1184276                        /lib64/librt-2.5.so
3986007000-3986008000 r-xp 00007000 08:01 1184276                        /lib64/librt-2.5.so
3986008000-3986009000 rwxp 00008000 08:01 1184276                        /lib64/librt-2.5.so
398fc00000-398fc11000 r-xp 00000000 08:01 1184309                        /lib64/libresolv-2.5.so
398fc11000-398fe11000 ---p 00011000 08:01 1184309                        /lib64/libresolv-2.5.so
398fe11000-398fe12000 r-xp 00011000 08:01 1184309                        /lib64/libresolv-2.5.so
398fe12000-398fe13000 rwxp 00012000 08:01 1184309                        /lib64/libresolv-2.5.so
398fe13000-398fe15000 rwxp 398fe13000 00:00 0
3995800000-399580d000 r-xp 00000000 08:01 1184319                        /lib64/libgcc_s-4.1.2-20080825.so.1
399580d000-3995a0d000 ---p 0000d000 08:01 1184319                        /lib64/libgcc_s-4.1.2-20080825.so.1
3995a0d000-3995a0e000 rwxp 0000d000 08:01 1184319                        /lib64/libgcc_s-4.1.2-20080825.so.1
2aacf1030000-2aacf1032000 rwxp 2aacf1030000 00:00 0
2aacf104e000-2aacf1050000 rwxp 2aacf104e000 00:00 0
2aacf1050000-2aacf105a000 r-xp 00000000 08:01 1184248                    /lib64/libnss_files-2.5.so
2aacf105a000-2aacf1259000 ---p 0000a000 08:01 1184248                    /lib64/libnss_files-2.5.so
2aacf1259000-2aacf125a000 r-xp 00009000 08:01 1184248                    /lib64/libnss_files-2.5.so
2aacf125a000-2aacf125b000 rwxp 0000a000 08:01 1184248                    /lib64/libnss_files-2.5.so
2aacf125b000-2aacf125f000 r-xp 00000000 08:01 1184246                    /lib64/libnss_dns-2.5.so
2aacf125f000-2aacf145e000 ---p 00004000 08:01 1184246                    /lib64/libnss_dns-2.5.so
2aacf145e000-2aacf145f000 r-xp 00003000 08:01 1184246                    /lib64/libnss_dns-2.5.so
2aacf145f000-2aacf1460000 rwxp 00004000 08:01 1184246                    /lib64/libnss_dns-2.5.so
7fffeb4ce000-7fffeb4e3000 rwxp 7ffffffea000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
MPI process (rank: 0) terminated unexpectedly on hostname
Exit code -5 signaled from hostname

Now, experimenting with MALLOC_CHECK_. First, equal to zero:
Code:
> MALLOC_CHECK_=0 ~/mvapich2/bin/mpirun_rsh -np 2 -hostfile host_file_name ./hellow
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
[cli_1]: aborting job:
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
[cli_0]: aborting job:
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
MPI process (rank: 1) terminated unexpectedly on hostname
Exit code -5 signaled from hostname
and if MALLOC_CHECK_=1:
Code:
> MALLOC_CHECK_=1 ~/mvapich2/bin/mpirun_rsh -np 2 -hostfile host_file_name ./hellow
malloc: using debugging hooks
malloc: using debugging hooks
malloc: using debugging hooks
malloc: using debugging hooks
malloc: using debugging hooks
malloc: using debugging hooks
*** glibc detected *** ./hellow: free(): invalid pointer: 0x000000001db02180 ***
*** glibc detected *** ./hellow: free(): invalid pointer: 0x000000001db02180 ***
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
[cli_0]: aborting job:
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
*** glibc detected *** ./hellow: free(): invalid pointer: 0x0000000006103180 ***
MPI process (rank: 0) terminated unexpectedly on hostname
*** glibc detected *** ./hellow: free(): invalid pointer: 0x0000000006103180 ***
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
[cli_1]: aborting job:
Fatal error in MPI_Init: Invalid buffer pointer, error stack:
MPIR_Init_thread(411): Initialization failed
(unknown)(): Invalid buffer pointer
Exit code -5 signaled from hostname
malloc: using debugging hooks

Finally, for your edification:
Code:
> ~/mvapich2/bin/mpirun_rsh -show -np 2 -hostfile host_file_name ./hellow

/bin/bash -c cd /home/username/MPIExamples; /usr/bin/env LD_LIBRARY_PATH=/usr/mvapich/lib/shared:/home/username/mvapich2/lib:/home/username/lib:/opt/pgi/linux86-64/2010/cuda/lib:/opt/pgi/linux86-64/2010/cuda/open64/lib:/opt/pgi/linux86-64/2010/lib:/opt/pgi/linux86-64/2010/libso:/opt/cuda/lib64: MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=hostname MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=52278 MPISPAWN_MPIRUN_PORT=52278 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=6397 MPISPAWN_ARGC=1 MPDMAN_KVS_TEMPLATE=kvs_332_hostname_6397 MPISPAWN_LOCAL_NPROCS=2 MPISPAWN_ARGV_0=./hellow MPISPAWN_GENERIC_ENV_COUNT=1  MPISPAWN_GENERIC_NAME_0=MV2_XRC_FILE MPISPAWN_GENERIC_VALUE_0=mv2_xrc_226_hostname_6397 MPISPAWN_ID=0 MPISPAWN_WORKING_DIR=/home/username/MPIExamples MPISPAWN_MPIRUN_RANK_0=0 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1 MPISPAWN_MPIRUN_RANK_1=1 MPISPAWN_VIADEV_DEFAULT_PORT_1=-1  /home/username/mvapich2-sock/bin/mpispawn 0
Back to top
View user's profile
hongyon



Joined: 19 Jul 2004
Posts: 551

PostPosted: Wed Jul 28, 2010 9:40 am    Post subject: Reply with quote

We will get RHEL5.5 install here asap. We have tested with RHEL5.3 and had no problem.


Hongyon
Back to top
View user's profile
hongyon



Joined: 19 Jul 2004
Posts: 551

PostPosted: Thu Jul 29, 2010 10:12 am    Post subject: Reply with quote

Hi,

We tried with RHEL5.5 and still see no problem with 64-bit 10.6. Which vession of PGI do you use? 32-bit, 64-bit, 10.5, 10.6? I guess we need to start from the beginning.

This is exactly the steps I do in csh:

1) Install PGI in my home directory(anywhere should be fine).

2) setenv PGI /home/my_home_dir/pgi

3) setenv PATH /home/my_home_dir/pgi/linux86-64/2010/bin:$PATH

4) cd to_mvapich2_dir

4) env CC=pgcc FC=pgfortran F77=pgfortran CXX=pgcpp CFLAGS=-fast FCFLAGS=-fast FFLAGS=-fast \
CXXFLAGS=-fast ./configure --prefix=/home/my_home_dir/mvapich/mympich2 --with-device=ch3:sock >& configure.log

5) make

6) make install

7) cd ~/mytest_dir

8) check hostme file

rhel55% more hostme
rhel55
rhel55% hostname
rhel55

9) compile and run

rhel55% /home/hongyon/mvapich/mympich2/bin/mpicc hello_mpi.c
rhel55% /home/hongyon/mvapich/mympich2/bin/mpirun_rsh -np 2 -hostfile hostme ./a.out
Hello world from process 0 of 2
Hello world from process 1 of 2

Please try those steps. If there still a problem, then I am really at my wit's end.

Might want to ask MPICH2 folks. Also check out at:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-580009.3.4

Not sure if this is related to a problem.

9.3.4 Creation of CQ or QP failure

A possible reason could be inability to pin the memory required. Make sure the following steps are taken.

1. In /etc/security/limits.conf add the following
* soft memlock phys_mem_in_KB

2. After this, add the following to /etc/init.d/sshd
ulimit -l phys_mem_in_KB

3. Restart sshd

With some distros, we’ve found that adding the ulimit -l line to the sshd init script is no longer necessary. For instance, the following steps work for our rhel5 systems.

1. Add the following lines to /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited

2. Restart sshd

Hongyon
Back to top
View user's profile
hongyon



Joined: 19 Jul 2004
Posts: 551

PostPosted: Thu Jul 29, 2010 1:22 pm    Post subject: Reply with quote

Can you please try with -O2 instead of -fast?

Hongyon
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Mon Aug 02, 2010 3:56 am    Post subject: Reply with quote

hongyon wrote:
Can you please try with -O2 instead of -fast?
I tried it with no flags at all, still no joy even following your example. Likewise for unlimiting locked memory, still no luck.

Looks like I'll need to move to the MVAPICH mailing list for help with this. If I ever solve this issue, I'll add a reply here. Sooner or later, I'll hopefully be back asking questions on linking to MVAPICH2. That'll be nice.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Licenses and Installation All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 3 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group