PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Execution problem when using mpiexec and PGI 6.0.x

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
fbissey



Joined: 20 Nov 2005
Posts: 3

PostPosted: Sun Nov 20, 2005 8:03 pm    Post subject: Execution problem when using mpiexec and PGI 6.0.x Reply with quote

Hi,

I am using the pghpf compiler to compile my programs.
I am working on a cluster with a pbs queuing system.
I didn't have any problems with version 5.x of the compiler.
We recently upgraded to version 6 and that's where my
problem begun.
My executable is submitted to the queue with following script:
Code:
#!/bin/tcsh
#PBS -q q2w1n
#PBS -j oe -k oe
#PBS -l nodes=1:ppn=2
#PBS -v BeginCFG,EndCFG

#cd /home/fgcao/runVacuumResp
cd /home/fgcao/fbissey/SandBox

# Set run parameters
set ncpus=2

setenv LD_LIBRARY_PATH /opt/pgi601/linux86/6.0/lib

###########################################################
#  Script Name must be 15 characters or less
#  To run:
#          qsub -v BeginCFG=001,EndCFG=001 lyplan2.csh
#
###########################################################

# Deal with file names
#
set exeFlags   = "-n $ncpus"
set beta = "b460"
set size = "s16t32"
set imp  = "IMP"
set basedir = "/home/fgcao/Configurations/"$size"/"
set dir  = "su3"$beta$size$imp
set baseConfig = $dir"c"
set yorn = ".true."
set smear3d = 30
set prefix = "./results/"
set exeName = "./VacuumRespLYplan"$size

  set thisReport = "RunStatusLYplan"$size"-"$ncpus"c"$BeginCFG"-"$EndCFG
  echo `date`
  pwd

# Run the parallel program
  echo  "mpiexec -verbose $exeFlags $exeName -pghpf -np $ncpus > $thisReport"
  mpiexec -verbose $exeFlags $exeName -pghpf -np $ncpus > $thisReport << ....END
$basedir
$baseConfig
$BeginCFG
$EndCFG
$prefix
3  three-loop improved fMuNu
$smear3d
1  1: action and topological charge, 2: electric and magnetic fields
$yorn
....END

And is submitted using qsub. Program compiled with pghpf
version 5 work fine. With version 6 it doesn't
I get the following message in one case:
Code:
mpiexec -verbose -n 2 ./VacuumRespLYplans16t32 -pghpf -np 2 > RunStatusLYplans16t32-2c192-192
0 - MPI_SEND : Invalid rank 1
[0]  Aborting program !
[0] Aborting program!
0 - MPI_SEND : Invalid rank 1
[0]  Aborting program !
[0] Aborting program!
mpiexec: Warning: tasks 0-1 exited with status 1.

and if I remove "-pghpf -np 2" from the script it becomes:
Code:
PGFIO-F-217/formatted read/unit=5/attempt to read past end of file.
 File name = stdin     formatted, sequential access   record = 1
 In source file VacuumRespLY_plan.f, at line number 119
[0] MPI Abort by user Aborting program !
[0] Aborting program!

In this case the program cannot read its input. I also have this
last behavior on a amd64 cluster without removing the
"-pghpf -np 2" argument.
Running the program interactively or changing the script to
execute outside of the queue (and on one processor) works.
Only when I try to run it with mpiexec in the queuing system
do I have problems.
What has changed to cause this behavior? And what can I do
apart from hardwiring my input?
Back to top
View user's profile
brentl



Joined: 20 Jul 2004
Posts: 132

PostPosted: Mon Nov 21, 2005 12:24 pm    Post subject: Reply with quote

This might be pretty hard to track down... First things:

1. make sure that no code compiled with 5.2 is being mixed with 6.0., any libs, etc,.

2. $PGI/linux86/6.0/src/mpi/mpi.c
should be compiled with your version of the mpi headers, and mpi.o should be linked ahead of the pgi libs.
Back to top
View user's profile
fbissey



Joined: 20 Nov 2005
Posts: 3

PostPosted: Mon Nov 21, 2005 2:55 pm    Post subject: Reply with quote

brentl wrote:
This might be pretty hard to track down... First things:

1. make sure that no code compiled with 5.2 is being mixed with 6.0., any libs, etc,.


My own programs and libs are clean in that respect thanks
to a "make clean" . Now I am not doing the admin on the
cluster and I didn't install MPI-CH myself. Does it need to
be recompiled against the new compiler or something?
Which brings your point #2 I guess.

brentl wrote:

2. $PGI/linux86/6.0/src/mpi/mpi.c
should be compiled with your version of the mpi headers, and mpi.o should be linked ahead of the pgi libs.


I read the README file in that directory. You suggest that I replace the standard mpi library (in this case I link
against libfmpich.a from the mpi-ch distribution) by
the by the object generated by this file.
I will give it a go I guess.
Back to top
View user's profile
fbissey



Joined: 20 Nov 2005
Posts: 3

PostPosted: Mon Nov 21, 2005 5:13 pm    Post subject: Reply with quote

Trying to compile with mpi.o. The linker still requires
libfmpich.a which I think is were the problem may lie.
Anyway using mpi.o produced "gcc -ansi -c mpi.c"
give a linking error:
Code:
mpi.o(.text+0x21): In function `__hpf_ISEND':
: undefined reference to `lam_mpi_byte'
mpi.o(.text+0x55): In function `__hpf_IRECV':
: undefined reference to `lam_mpi_byte'
mpi.o(.text+0x9e): In function `__hpf_SEND':
: undefined reference to `lam_mpi_byte'
mpi.o(.text+0xcb): In function `__hpf_RECV':
: undefined reference to `lam_mpi_byte'
mpi.o(.text+0xe5): In function `__hpf_Abort':
: undefined reference to `lam_mpi_comm_world'
mpi.o(.text+0x115): In function `__hpf_Init':
: undefined reference to `lam_mpi_comm_world'
mpi.o(.text+0x11f): In function `__hpf_Init':
: undefined reference to `lam_mpi_comm_world'

So it doesn't work anyway.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Wed Nov 23, 2005 9:58 am    Post subject: Reply with quote

Hi fbissey,

As Brent indicated, this is tough one to track down since there are a lot of pieces in place. The most likely cause is that your MPICH fortran interface needs to be rebuilt using the 6.0 version of the compilers. However, given the undefined references in your last post and that your using mpiexe, It appears that your actually using LAM/MPI not MPICH. In either case, try compiling and linking with the MPICH libraries that were included with 6.0 CDK release (lin the "libs" directory). Then run your application using the "mpirun" script found the PGI bin directory.

If this works, then you should just need to recompile your MPI fortran interface. If it still fails, please send a report along with the code to trs@pgroup.com since it could be a compiler issue.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group