PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

poor pgi openmp performance??
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Wed Jun 20, 2012 7:59 am    Post subject: Reply with quote

I would recommend starting with the following compilation flags:

-mp -fast -Mipa=fast,inline

There are a number of things that can effect the runtime performance.

*) Thread to processor core binding. Depending on what type of system you are running
on, and the number of OpenMP threads, the placement of threads on cores can make
a significant difference. This binding can be controlled using the PGI environment
variables, MP_BIND and MP_BLIST.

*) Is the system you are running on a "NUMA" system? Does it have multiple processor
sockets? If so, then you also have to consider the NUMA effect.

As Mat had suggested, without a detailed performance analysis, its very difficult to know exactly what is the cause.
Back to top
View user's profile
steve.xu



Joined: 20 Feb 2012
Posts: 25

PostPosted: Wed Jun 20, 2012 8:29 pm    Post subject: Reply with quote

Thanks toepfer.
Our system is NUMA. Each node contains 2 processor sockets and each socket contains a Xeon 5670.

I just complie our CFD code with both Intel and PGI compilers, without thread to processor core binding for each compiler.
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Thu Jun 21, 2012 10:11 am    Post subject: Reply with quote

When you run on one of these nodes, how many MPI processes do you run with? Do you set the OpenMP environment variable OMP_NUM_THREADS?
Back to top
View user's profile
steve.xu



Joined: 20 Feb 2012
Posts: 25

PostPosted: Tue Jun 26, 2012 7:35 pm    Post subject: Reply with quote

I just use one MPI process with 12 OpenMP threads. I use -O2 flag for pgi and -O3 flag for Intel. Surely i set OMP_NUM_THREADS to 12 for both intel and pgi. And i finally find intel OpenMP is almost 2 times faster than pgi with 12 threads. For sequential code, i can only see 30%-50% performance gap between intel and pgi as regard to our code.
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Thu Jun 28, 2012 8:27 am    Post subject: Reply with quote

Using just the -O2 flag for PGI does not enable automatic vectorization, whereas using the -O3 flag for the Intel compiler does. A better flag to use for PGI instead of -O2 is -fast. This will enable automatic vectorization as well as other optimizations and more closely matches that of Intel's -O3 flag.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group