|
| View previous topic :: View next topic |
| Author |
Message |
zvanaz
Joined: 22 Feb 2007 Posts: 2
|
Posted: Fri Mar 09, 2007 6:35 pm Post subject: profiler inconsistency? |
|
|
Hi all,
I'm trying to profile a few subroutines within a fairly large f90 code, so I compiled and linked with -Mprof=func, and ran several tests. At the same time within the code I used etime and dtime to have time estimates for the same subroutines and I found two types of inconsistency:
a) for a subroutine which only reads in data, dtime tells me it took 61 s (user+sys), while the profiler, for the same run and subroutine says 359. The call to the subroutine is preceded and followed by dtime.
b) subroutine A has a few simple statements and small do while loop, then two nested do loops from which two subroutines (B and C) are called alternatively. at each pass.
The self times and costs cumulative and per call, of B and C are constants within accuracy, but the parent subroutine A has different self times during different tests: 246 s with 1 print statement, and 509 without!!! everything else being the same.
None of the mentioned subroutines makes system calls, or outputs anything with the exception of the one print statement.
these are all the options I'm using:
-Mvect=cachesize:524288 -Munroll -Mnoframe -O2 -pc 64 -Mfree -Mprof=func
any idea?
thanks a lot!
Giovanni |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Tue Mar 13, 2007 4:07 pm Post subject: |
|
|
Hi Giovanni,
I passed this on to our Tools Group but they're not sure. Can you please send an example and a description on how to recreate the problem?
Thanks,
Mat
Last edited by mkcolg on Sat Mar 17, 2007 8:03 am; edited 1 time in total |
|
| Back to top |
|
 |
zvanaz
Joined: 22 Feb 2007 Posts: 2
|
Posted: Thu Mar 15, 2007 6:21 pm Post subject: |
|
|
Hi Mat,
| mkcolg wrote: | Hi Giovanni,
I passd this on to our Tools Group but they're not sure. Can you please send an example and a description on how to recreate the problem?
|
I looked into it, but sending you the whole thing with data and all you need to run would be about 1Gb, and taking that specific piece of code offline would require quite some work.
I was able to determine that part of the inconsistencies were due to hardware problems of the specific node I was using on a red hat linux cluster. Once I switched node results did improve a lot, but the problem b) changed slightly, even after I got rid of the print statements and compiled with no optimization at all:
subroutine A has two nested do loops, for each pass it executes alternatively subroutine B xor C, depending on a dynamic condition and on a second condition determined through a configuration file once per run. If conditions are such that C is never executed the self time of A is 0.5 s. If C is executed (~ 70% of the times), the self time of A is 44 s, and note that C is at least 10 times faster B. Also what is done in A, outside B and C is always the same.
I'm not sure what's going on,
Thanks!
Giovanni |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Sat Mar 17, 2007 8:24 am Post subject: |
|
|
Hi Giovanni,
A couple of things to note. Adding print statements can effect what optimizations can be performed and might account for some of the difference. Try various optimization levels, "-O1', "-O2", "-fast", and "-fastsse", to see how your code is effected.
Also, "-Mprof=func" instruments your code which is a fairly intrusive profiling method. You might consider using a hardware counter sampling method instead as described in Chapter 2 of the PGI Tool's Guide.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|