PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Strange Segmentation fault

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
ofuhrer



Joined: 18 Feb 2008
Posts: 16

PostPosted: Fri Jan 16, 2009 12:49 am    Post subject: Strange Segmentation fault Reply with quote

Hi all,

I have a strange error and I am starting to doubt wether it could not be related to a compiler issue... I run a rather complex code (weather model) and have gotten a segmentation fault uppon calling of a routine which is quite some time into the exectution of the code. The routine is calle organize_output... I've tried to reduce it as much as possible to still get an error and now it looks like this...

Code:

SUBROUTINE organize_output

REAL (KIND=irealgrib) :: zprocarray_grib(ie_max,je_max,num_compute)
REAL (KIND=ireals) :: zvarlev(ie,je,0:MAX(ke+1,nlevels)), &
 zprocarray_real(ie_max,je_max,num_compute), slev(0:MAX(ke+1,nlevels))
REAL (KIND=ireals) :: zenith_t (ie,je), zenith_w (ie,je), zenith_h (ie,je), &
 zcape_mu (ie,je), zcin_mu (ie,je), zcape_ml (ie,je), zcin_ml (ie,je), &
 zcape_3km(ie,je), zlcl_ml (ie,je), zlfc_ml (ie,je), zbrn (ie,je,ke)

  print *,'*** beginning of subroutine organize_output'
  print *,zbrn(1,1,1)
  print *,'gugu'
  print *,zprocarray_grib(1,1,1)
  print *,zvarlev(1,1,0)
  print *,zprocarray_real(1,1,1)
  print *,slev(0)
  print *,zenith_t(1,1)
  print *,zenith_w(1,1)
  print *,zenith_h(1,1)
  print *,zcape_mu(1,1)
  print *,zcin_mu(1,1)
  print *,zcape_ml(1,1)
  print *,zcin_ml(1,1)
  print *,zcape_3km(1,1)
  print *,zlcl_ml(1,1)
  print *,zlfc_ml(1,1)
  print *,'*** end of subroutine organize_output'

END SUBROUTINE organize_output


Upon execution the output is as follows...

Code:

 *** before_call_to_organize_output
 num_compute=            1
 nlevels=           40
 ie,je=           41           51
 ie_max,je_max=           41           51
 nzmxid=          130
 *** calling
 *** beginning of subroutine organize_output
Segmentation fault (core dumped)


Sometimes (depending on the details of the lines still remaining in the subroutine) the error message is also...

Code:

 *** before_call_to_organize_output
 num_compute=            1
 nlevels=           40
 ie,je=           41           51
 ie_max,je_max=           41           51
 nzmxid=          130
 *** calling
0: ALLOCATE: 18446744071899487520 bytes requested; not enough memory


Upon access to the zbrn array, the code segfaults. I've tried "unlimit; setenv MPSTKZ 40000000" with no effect. The code is VERY sensitive to any changes in what remains in the routine... If I remove one line (either in the declarations or the print statements) the behaviour can change to run smoothly without any error.

My compilation options are...

pgf90 -c -I. -I/nfs/xt3-homes/users/olifu/src/lm_4.7_dwd/src -I/opt/xt-mpt/default/mpich2-64/P2/include -I/apps/netcdf/linux/include -Mfree -Mpreprocess -Kieee -Mbyteswapio -O0 -C -g -gopt -Mbounds -Mchkfpstk -Ktrap=fp -o src_output.o /nfs/xt3-homes/users/olifu/src/lm_4.7_dwd/src/src_output.f90

My machine is a Cray XT-4 and I am running on the service nodes for debugging purposes...

> uname -a
Linux buin2 2.6.5-7.283-ss #4 SMP Fri Sep 28 13:24:48 PDT 2007 x86_64 x86_64 x86_64 GNU/Linux

The version of pgf90 I use is...

> pgf90 -V
pgf90 7.2-4 64-bit target on x86-64 Linux -tp k8-64e

Can anyone give me an idea to what might cause this type of behaviour?

I would be very grateful for any suggestions,
Oliver
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Fri Jan 16, 2009 11:25 am    Post subject: Reply with quote

Hi Oliver,

This does seem more likely to be a compiler issue related to automatic array allocation but I'm not sure. We did have an issue with ECHAM (TPR#15414) where the size of an automatic array was being calculated after it was allocated, but this involved passing in an array and then using it's size (via the "SIZE" intrinsic) in the declaration of a second automatic array. Though, your issue is different enough that I'm not positive they are related.

TPR#15414 was reported in the 7.2-5 compiler but may have also been present in 7.2-4. It was fixed in the 8.0-2 release so you may want to try the latest release to see if it fixes the problem. If not, please send a report to PGI Customer Service at trs@pgroup.com. Will most likely need the full code, or a example which illustrates the problem.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group