PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Unassigned in compiler-generated code for array function cal

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Keith Refson



Joined: 10 Jul 2006
Posts: 14

PostPosted: Tue Sep 17, 2013 4:04 am    Post subject: Unassigned in compiler-generated code for array function cal Reply with quote

I have spent some time tracking down a long-standing bug which causes a failure in our electronic structure code (CASTEP) which takes the form of machine-dependent, optimization level dependent (and other unknown factors) run-time abort with

0: ALLOCATE: 13219868032 bytes requested; not enough memory

This does not always happen. The identical executable and input files produces the error on one Dell/linux box (a Penryn) but not another of slightly different spec (Nehalem). Furthermore, it occurs when the code is compiled at low optimization level but NOT with -fast. And the failure is activated/deactivated by unrelated modifications to untelated parts of the code.

This rather suggests that the size being passed to an internal allocate is dependent on an unassigned variable,

Now, as every ALLOCATE statement in the code has a STATUS= flag, and because no other compiler of Sun, Intel, Pathscale, Nag or GNU has such a problem I can be sure this is not simply a mistakenly large memory request depending on an unassigned variable by the code.

Fortunately VALGRIND helps:
Code:

==21644== Use of uninitialised value of size 8
==21644==    at 0x77FBBAB: _itoa_word (in /lib64/libc-2.14.1.so)
==21644==    by 0x77FE7C8: vfprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x781E9A3: vsprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x78062A6: sprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x1A515FF: __alloc04 (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x1A503A6: pgf90_alloc04 (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x62C4D5: geometry_geom_constrain_strain_ (geometry.f90:7503)
==21644==    by 0x1DB8223: ??? (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x1DB824F: ??? (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==
==21644== Conditional jump or move depends on uninitialised value(s)
==21644==    at 0x77FBBB5: _itoa_word (in /lib64/libc-2.14.1.so)
==21644==    by 0x77FE7C8: vfprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x781E9A3: vsprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x78062A6: sprintf (in /lib64/libc-2.14.1.so)
==21644==    by 0x1A515FF: __alloc04 (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x1A503A6: pgf90_alloc04 (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x62C4D5: geometry_geom_constrain_strain_ (geometry.f90:7503)
==21644==    by 0x1DB8223: ??? (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)
==21644==    by 0x1DB824F: ??? (in /user/buildbot/Slaves/Rahman/linux_x86_64_portland-13_3-0-warnings/build/obj/linux_x86_64_pgf90/castep.serial)

The line in question (geometry.f90:7503) reads:
Code:

      if(on_root) f_vec=geom_apply_vec(mdl,delta_vec,'n')

where geom_apply_vec is an array-valued function and f_vec an array
Code:

     function geom_apply_vec(mdl,f_vec,inv) result(p)
      real(kind=dp), dimension(ndim)                :: p

and ndim is a module integer variable with is most definitely assigned at this point.
Code:

      integer, save :: ndim  !the number of dimensions in the search space


The valgrind unassigned check is triggered as a warning on every invocation of this function including many before the ALLOCATE failure. My interpretation is that the compiler is allocating a temporary array to hold the function result before assignment; that the computation of the required size is buggy and relies on some unassigned data somehow, and that only on this particular instance is that unassigned value so large as to cause the allocation failure. I re-iterate that at the source code level the value of the module variable ndim is assigned, and printed out by the code.

This interpretation is supported by the disappearance of the abort at high optimization level, where presumably the compiler has determined that it can assign directly to the result array and that an intermediate allocation is unnecessary.

This occurs with pgf90 v 11.9 and 13.3.0, and probably other versions of PGI Fortran.

Is there anything in the internal bug database which looks like this? I would like to know before the embarking on the tedious, wearying and possibly unproductive effort to boil down a large code to a small testcase.

System details:




$ uname -a
Linux kohn.nd.rl.ac.uk 3.8.13.4-server-1.mga3 #1 SMP Thu Jul 4 14:04:54 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

ABORT occurs on processor

pgf90 13.3-0 64-bit target on x86-64 Linux -tp penryn

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E8600 @ 3.33GHz
stepping : 10
microcode : 0xa07
cpu MHz : 2000.000
cache size : 6144 KB


Run does NOT abort on

pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU W3565 @ 3.20GHz
stepping : 5
microcode : 0x11
cpu MHz : 3200.093
cache size : 8192 KB
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Tue Sep 17, 2013 9:38 am    Post subject: Reply with quote

Hi Keith,

Quote:
Is there anything in the internal bug database which looks like this?
No, sorry. Though, it sounds like a very specific problem so most likely doesn't show up in other codes.

My guess is if you tried to distill this issue, it wouldn't reproduce. Would it be possible to get CASTEP so we can diagnose the cause of this issue?

Thanks,
Mat
Back to top
View user's profile
Keith Refson



Joined: 10 Jul 2006
Posts: 14

PostPosted: Wed Sep 18, 2013 3:15 am    Post subject: Reply with quote

I agree with your guess that this would be difficult to distil. I imagine that if it failed for every array function assignment that would have been noticed by now, and I have no idea of what it is specifically that trigers the failure.

Let's discuss offline how to get you a copy of the code.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group