PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Polyhedron benchmark
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Michal Kvasnicka



Joined: 28 Apr 2010
Posts: 23

PostPosted: Fri Feb 18, 2011 1:25 am    Post subject: Reply with quote

OK ... release 11.2 is out. What is the current status regarding polyhedron fortran benchmark?

Are there still so huge performance gaps?

Michal
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Fri Feb 18, 2011 3:26 pm    Post subject: Reply with quote

Hi Michal,

Currently we have an application engineer dedicated to investigating general auto-parallel performance. Induct is one of several applications that he is investigating. The process does take time since our concern is not just a single benchmark but how optimizations effect a wide variety of customer codes.

Though, customer input is important to us in prioritizing tasks. In your opinion, how important is it that PGI be able to auto-parallelize the Induct benchmark?

Thanks,
Mat
Back to top
View user's profile
Michal Kvasnicka



Joined: 28 Apr 2010
Posts: 23

PostPosted: Sat Feb 19, 2011 11:01 am    Post subject: Reply with quote

Mat,

I am permanently looking for best compiler suite (C/C++, Fortran, Profiler, Debugger, etc.) and one of most important feature is the fact, that compiler will be able to produce the binaries as fastest as possible. From this point of view the PGI fortran compiler is just now #3 (after absoft and intel). I made a several more or less comprehensive benchmarks (Polyhedron is only one of them) and a few benchmarks used my most important fortran codes. The PGI is robust and reliable compiler but does not produce the binearies which are able to fully exploit computing power of recent CPUs.

Actually, the current PGI fortran compiler produce in general (on INTLEL CPUs + Linux) binaries which are typically always slower than binaries produced by INTEL or Absoft compilers. And sometimes the performance gap is really very significant (30-250%).

The auto-parallelization is extremely important compiler feature for legacy codes, because there is no chance to expect that authors will be able to rewrite this codes for current parallel architectures.

Michal
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Feb 23, 2011 6:02 pm    Post subject: Reply with quote

Hi Michal,

I spent some time recreating the posted Polyhedron results. It happens that I have an Dell XPS Intel Core i7 920 which very similar to what Polyhedron used. I got slightly better results then they did, but it's within run-to-run variance, especially given the short run time of these benchmarks. Like most codes, PGI will be faster on some while Intel will be fast on others. Overall, when Auto-parallization is also used for PGI, Intel is only ~3% faster. The only outlier is Induct which we are investigating.

Polyhedron is an important metric of perceived performance. However, the problem we face is how to balance our priorities between this perceived performance and the actual performance of our customer codes. If a benchmark is spead-up because of a 'creative' optimization that has no effect on all but a very few customer codes, should we spend our engineering time implementing these optimizations? Unfortunately, the answer is yes because not doing the optimization is detrimental to the perceived performance of the compiler, though not the actual. We just put a lower priority on these types of optimizations.

What does concern me is when you say that PGI is consistently slower than Intel. Is this with your own code? or is this based solely on benchmarks? If it is your own code, would it be possible to send us a representative example to better understand where the performance difference occurs?

Note that PGI is typically more conservative in regards to numerical accuracy and keeps within 1Ulps even with "-fast". Intel's "-Ofast" flag is roughly equivalent to PGI's "-fast -Mipa=fast, inline -Mfprelaxed" where "-Mfprelaxed" will use less precise (up to 3Ulps off) fp operations. It's very possible that the performance difference you are seeing is simply due to the optimizations being used.

Thanks,
Mat

Apologies for the poor formatting
    PGI Serial PGI Parallel Speed-Up
    ac 10.18 10.42 -2.30%
    aermod 16.2 16.51 -1.88%
    air 5.42 3.53 53.54%
    capacita 29.72 31.17 -4.65%
    channel 2.25 1.52 48.03%
    doduc 24.21 25.66 -5.65%
    fatigue 6.11 5.93 3.04%
    gas_dyn 3.55 2.18 62.84%
    induct 27.1 28.07 -3.46%
    linpk 7.82 6.51 20.12%
    mdbx 12.31 10.13 21.52%
    nf 11.06 10.03 10.27%
    protein 36.13 37.69 -4.14%
    rnflow 24.18 17.88 35.23%
    test_fpu 6.07 5.16 17.64%
    tfft 2.13 2.23 -4.48%
    Geo Mean 10.01 8.83 13.37%


    PGI Parallel Intel Parallel Difference
    ac 10.42 9.81 -5.85%
    aermod 16.51 13.96 -15.45%
    air 3.53 2.83 -19.83%
    capacita 31.17 28.05 -10.01%
    channel 1.52 1.82 19.74%
    doduc 25.66 25.88 0.86%
    fatigue 5.93 11.54 94.60%
    gas_dyn 2.18 2.57 17.89%
    induct 28.07 8.69 -69.04%
    linpk 6.51 8.13 24.88%
    mdbx 10.13 10.11 -0.20%
    nf 10.03 9.91 -1.20%
    protein 37.69 30.85 -18.15%
    rnflow 17.88 18.03 0.84%
    test_fpu 5.16 5.69 10.27%
    tfft 2.23 2.23 0.00%
    Geo Mean 8.83 8.51 -3.64%

Times are in seconds.

Flags:
PGI Serial: -Bstatic -V -fastsse -Munroll=n:4 -Mipa=fast,inline
PGI Parallel: -Bstatic -V -fastsse -Munroll=n:4 -Mipa=fast,inline -Mconcur=innermost
Intel Parallel: -O3 -fast -parallel -ipo -no-prec-div

OMP_NUM_THREADS set to 4.
Back to top
View user's profile
Michal Kvasnicka



Joined: 28 Apr 2010
Posts: 23

PostPosted: Thu Sep 22, 2011 9:21 am    Post subject: Reply with quote

Hi guys!

Did you see recent update of the Polyhedron Benchmark:
http://www.polyhedron.com/compare0html

Of course, every benchmark is more or less specific and corresponding results are not valid in general. But I think, that PGI should start to work on code optimization improvement, because in other case the differences going to be bigger.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group