PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

CAM and PGI 7.1-5

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
elvedin



Joined: 26 Sep 2008
Posts: 3

PostPosted: Fri Sep 26, 2008 8:26 am    Post subject: CAM and PGI 7.1-5 Reply with quote

We're running cam3-1.1.p1 (8 OpenMP threads over OpenMPI 1.2.5)on a 10 nodes of 8 core Intel/Linux systems and I found a "memory leak" consuming about 140MB per hour on each compute node. I ran valgrind with full checks over the cam process and it didn't find any faults with cam, just quite a few errors when it was opening libs at the very beginning.

For parts of cam, the default optimization level is -O (O2?) which is where we are finding the leak. When I tested with -O1, there was a memory leak as well, or at least I think I tested with -O1.

Build script options -

## If an executable doesn't exist, build one.
if ( ! -x $blddir/cam ) then
cd $blddir || echo "cd $blddir failed" && exit 1
$cfgdir/configure -spmd -smp -fopt '-O1' -nc_lib /soft/local/netcdf/netcdf-3.6.2/lib \
-nc_inc /soft/local/netcdf/netcdf-3.6.2/include \
-cc pgcc -fc pgf90 -res 128x256 \
-mpi_inc /soft/local/openmpi/openmpi-1.2.5/include \
-mpi_lib /soft/local/openmpi/openmpi-1.2.5/lib


Any known issues with this setup? My experience of issues with higher optimization levels are segmentation faults, but never a memory leak.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Fri Sep 26, 2008 10:11 am    Post subject: Reply with quote

Hi Elvedin,

Although I have not seen a situation where a particular optimization causes a memory leak, it is a possibility. I have CAM3-1.1.p1 and valgrind here and will look into it today or Monday.

Thanks for the report,
Mat
Back to top
View user's profile
elvedin



Joined: 26 Sep 2008
Posts: 3

PostPosted: Fri Sep 26, 2008 12:39 pm    Post subject: Reply with quote

Thanks, I'll keep looking into it as well.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Tue Sep 30, 2008 2:37 pm    Post subject: Reply with quote

Hi Elvedin,

I've spent a few days looking at this and think I have and understanding of what you're seeing. For reference, I used PGI 7.1-5 and CAM v3.0 which is slightly different then your version. I build CAM using "-g", "-O2 -gopt" and "-fast -gopt" and then compared the resulting valgrind outputs.

At "-g", there are no compiler optimization so the few reported errors were strictly from CAM. It appears that there are a few uninitialized variables, but no huge problems. At "-O2", I saw little difference in the valgrind logs versus "-g" and valgrind did not report any memory leaks. However at "-fast", I saw thousands valgrind errors which caused it to abort early since it stops reporting once the number of errors get too high.

Example Valgrind error when CAM is compiled at "-fast":
Code:
==2851== Conditional jump or move depends on uninitialised value(s)
==2851==    at 0x482A3E: cldnrh_ (/tmp/cam1/models/atm/cam/src/physics/cam1/cldnrh.F90:101)
==2851==    by 0x1CCBB587: ???
==2851==    by 0x44A429F: ???
==2851==    by 0x44A441F: ???
==2851==    by 0xD267DF: ???
==2851==    by 0x44BE55F: ???
==2851==    by 0x1CCC0407: ???
==2851==    by 0x44A15DF: ???
==2851==    by 0x44A459F: ???
==2851==    by 0xC0487EF: ???
==2851==    by 0xBFD85BF: ???
==2851==    by 0xD2AA1F: ???

As you can see, Valgrind is fairly confused by the optimized code. I think it's unable to follow where the compiler has stored variables in registers thus printing out thousands of the uninitialized conditional jump messages.

At this point I'm more inclined to believe that the errors you're seeing are due to Valgrind's reading of the optimized code rather than the compiler creating a memory leak at high optimization. Of course though, I didn't repeat your exact experiment so please let me know if your interpretation is different and I can pursue the issue further.

- Mat
Back to top
View user's profile
elvedin



Joined: 26 Sep 2008
Posts: 3

PostPosted: Tue Sep 30, 2008 10:39 pm    Post subject: Reply with quote

Valgrind found no faults with CAM, it's just that memory increase (>=140MB per hour) we're seeing every hour. Under -g debugging, we're getting no such memory increase. You should be able to get our setup through the default optimizations.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group