PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

PG compilation/execution problems, works fine in UNIX system
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Jack_17



Joined: 17 Feb 2006
Posts: 6

PostPosted: Fri Mar 03, 2006 3:18 pm    Post subject: PG compilation/execution problems, works fine in UNIX system Reply with quote

I am extremely new to the whole LINUX/UNIX world, and have encountered a problem with an old code. I obtained an old FORTRAN 77 code that has been in use for many years and was asked to load it on to our system at school and run some calculations. When compiling with pgf77 the code compiled just fine but would bomb out on a segmentation fault when executing (after reading in an input file and starting to step through the calculation). I ran the debugging program and found out where the problem occurred, but it seemed like everything was fine. As a hunch I moved the code over to a UNIX based system, compiled and executed ... where everything sung just nicely beginning to end.

The code uses a lot of the old "COMMON", "DATA", "DIMENSION", and "MEMORY" statements from the era prior to dynamic memory allocation, and I was wondering if there is some inherent difference between LINUX and UNIX that is screwing me up (since there were no errors on the UNIX side) ... and if so, if there is an "easy" or not so easy, but standard fix to this type of a problem. Any thoughts are greatly appreciated, sorry for not knowing more ...

~Jack Galloway
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Mar 06, 2006 3:06 pm    Post subject: Reply with quote

Hi Jack,

Let try something easy first and see if your getting a stack overflow. Try setting your stack size to "unlimited". The exact syntax depends on your shell.

For Bash use "ulimit -s unlimited"
For Tcsh use "unlimit" or "limit stacksize unlimted"

Another thing to try is to compile with "-Mbounds" to see if your writting past the end of an array. Different systems lay memory out differently so writting past the end of an array on one system may be ok, but fails on another.

If these don't work, then you'll need to do a some more investigation. Compile you code with "-g" and then run the executable using the PGI debugger, pgdbg. Once you find where the seg fault occurs, set a breakpoint before the offending code and step through the program line by line to see if you can determine what's causing the error.

- Mat
Back to top
View user's profile
Jack_17



Joined: 17 Feb 2006
Posts: 6

PostPosted: Mon Mar 06, 2006 9:33 pm    Post subject: Reply with quote

Mat, thanks for the reply. I did both things that you asked and it did return an error that said "PGFTN-F-Subscript out of range for array a..." and gave the corresponding line number.

I actually have run the debugger and found the line where the error is occurring. Interestingly the variable it bombs out on is a calculated variable of two numbers which are divided. Both numbers are real numbers, and exist (I ask for write statements just before it bombs out to ensure they are true values) which seems to indicate the variable the calculation is assigned to is fouled up in memory or something to that effect? One thought, I tried to track where this variable comes from, through the various subroutines where it is called, and it seems it originates in a *.libd file, which is a binary file containg a lot of nuclide information (this is a code performing nuclear calculations).

Two questions, if this array is out of range what does that mean? Second, if this binary *.libd was generated on a UNIX system, could it be malformatted for a Beowulf cluster running Linux? (I'm just tossing out guesses). I tracked the same variables on the UNIX side, where the code executes fine, and at some point they diverge with respect to the UNIX side , where the Linux numbers become extremely large, whereas in the UNIX side they are much smaller. Although I'm new, I really get the feeling there is something fouled up in memory between the two platforms. Thanks again for the help on this.

~Jack
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Mar 08, 2006 11:06 am    Post subject: Reply with quote

Hi Jack,

While "-Mbounds" can give false positives, your should first investigate what's happening with this array. It doesn't always work if the dimensions of the array are unknown so if the lower or upper bounds printed in the error message are zero, then it's most likely a false positive. If you are actually writting past the end of an array, you could be lucky and be writting to memory that is not used (such as padding) or you could be overwritting the value of an important variable such as a pointer. If the overwritten variable were a pointer, then this could cause a seg fault later in the program. Memory layout changes on different systems and using differenct compilers so would be why your seeing it on one system but not another.

I'm not familuar with the file extension ".libd". Is this simply a binary file which contains data which is read by your program, or is a library containing executable code? If it's just a data file, what is the Endianness (the order in which the bytes of multi-byte data-types are stored) of the UNIX system it was generated on? If it's a mainframe IBM or Sun Sparc system, it's most likely big endian. PC's use little endian so reading in a file containing big endian values can cause unexpected results. If this is the case, try recompiling with "-Mbyteswapio" to tell the compiler that your using a big endian data file.

- Mat
Back to top
View user's profile
Jack_17



Joined: 17 Feb 2006
Posts: 6

PostPosted: Wed Mar 08, 2006 7:59 pm    Post subject: Reply with quote

Mat,

You hit the nail on the head with the endianness comment. (I didn't know much about this previously), but have found that the Sun Unix platform was maybe big endian and the Linux was little endian, and added a statement in the program stating:

CONVERT='BIG_ENDIAN'

when opening the file, as it was just a binary file with the .libd extension to tell the user that it is a library file. But you're saying you can specify this same command at the compiler level using the "-Mbyteswapio" command, which is probably simpler and better than modifying existing code. Thanks so much for your help on this. Your guys' compilers are sweet.

~Jack
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group