|
| View previous topic :: View next topic |
| Author |
Message |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Mon Apr 09, 2007 12:22 pm Post subject: |
|
|
Hi Tiago,
It could be a UMR, or precission issue, but I'm not entirely sure. One thing you could do is to compile each of your source files at "-O2", then compile and link one at a time using "-O0" until the NaN's go away. This will help in narrowing which source file the NaNs are being caused.
Next, separate each of the routines in this file into thier own file. Repeast compiling at "-O2" and "-O0" until your able to determine which routine is causing the problem.
Now recompile all files at "-g" and then create a second executable using "-O2 -gopt". Run each executable side-by-side in their own PGDBG session, breaking at problem subroutine. Step through each line comparing the values of the variables until they diverge.
- Mat |
|
| Back to top |
|
 |
Tiago
Joined: 08 Mar 2007 Posts: 4
|
Posted: Tue Apr 10, 2007 2:28 pm Post subject: the read issue again |
|
|
Hi again,
Still getting seg falt, but valgrind has detected some uninitialised values that take me back to the problem with the read command that I mentioned on the 1st post. Once more it seems to me that the code is good. Valgrind says (abriged):
Conditional jump or move depends on uninitialised value(s)
at 0x4F089F: fr_readnum (in /(...)/most_plasim_t21_l10_p1.x)
by 0x4F02FE: fr_read (in /(...)/most_plasim_t21_l10_p1.x)
by 0x4EFD6B: __f90io_fmt_read (in /(...)/most_plasim_t21_l10_p1.x)
by 0x503788: hpfio_read (in /(...)/most_plasim_t21_l10_p1.x)
by 0x5035B9: __hpfio_loop (in /(...)/most_plasim_t21_l10_p1.x)
by 0x503909: __hpfio_main (in /(...)/most_plasim_t21_l10_p1.x)
by 0x4EE0AA: pghpfio_fmt_read (in /(...)/most_plasim_t21_l10_p1.x)
by 0x49BEAF: surface_ini_ (surfmod.f90:307)
by 0x402249: prolog_ (plasim.f90:92)
by 0x401CBA: MAIN_ (plasim.f90:56)
by 0x401C4D: main (in /(...)/most_plasim_t21_l10_p1.x)
--25634-- REDIR: 0x3CD886FBE0 (strnlen) redirected to 0x4906C80 (strnlen)
==25634==
Process terminating with default action of signal 11 (SIGSEGV): dumping core
Bad permissions for mapped region at address 0x95ADFC
at 0x95ADFC: ???
by 0x40245D: prolog_ (plasim.f90:116)
by 0x401CBA: MAIN_ (plasim.f90:56)
by 0x401C4D: main (in /(...)/most_plasim_t21_l10_p1.x)
On the code (surfmod.f90:307) this corresponds to :
integer :: ih(:)
(...)
read (nsurunit,'(8I10)',IOSTAT=iostat) ih(:)
I have tried to play it safe with
integer :: ih(1:8) = 0
(...)
read (nsurunit,'(8I10)',IOSTAT=iostat) ih(1:8)
but nothing changes:
The ASCII file that is being read also looks fine. Could this be a bug with pgf90 (6.0) and if so is there a workaround?
I wasn't able to try your suggestion of compiling different objects with different optimization levels because It always crashes no matter the compiler flags.
Thanks,
tiago |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Tue Apr 10, 2007 3:06 pm Post subject: |
|
|
Hi Tiago,
The binary search method I mentioned above should be used to help find the NaNs (aka Problem #1). As I mentioned before, for your seg fault (aka Problem #2) we need a way to reproduce the error here in order to determine if it is indeed a compiler issue. Please send a report to PGI Customer Service at trs@pgroup.com with example code.
Thanks,
Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|