PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

about PGI OpenMP

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Teslalady



Joined: 16 Mar 2012
Posts: 74

PostPosted: Wed Jul 11, 2012 9:13 am    Post subject: about PGI OpenMP Reply with quote

My question is about OpenMP.I have a set of codes ,and the performance using pgi openMP is worse than intel's . because our code has ten thousands of lines, it is impossilble to rewrite the code.we pay much attention to the performance.

I suspect for two reasons:
1) In the codes, we use many dynamic data structures that contain pointers, I think the PGI pointer processing efficiency may be worse than the intel, but I'm not sure,

2) Maybe I did not set pgi thread to the kernel binding well. I know there are two parameters of MB_BIND and MB_LIST to set pgi kernel binding. Each node of our machine has two cpu, each cpu has 6 cores, two CPUs are located in two different socket. Within a socket , I can set MB_LIST = 5,4,3,2,1,0 for a CPU, but for the two sockets,there are total 12 kernels.I set MB_LIST performance but the performance was decline.

Can you give me advices?

thanks for your patience for my poor English...
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Jul 11, 2012 4:02 pm    Post subject: Reply with quote

Hi Teslalady,

Your customer also sent this question to PGI Customer Service with several mails being sent back and forth. In reading the exchanges, they decided that the use of pointers wasn't the problem since the serial speed was comparable. Also, they saw good performance when they use MP_BIND/MP_BLIST to bind to a single socket.

Their current follow question is how to bind to multiple sockets. The simple answer is that they just need to extend their MP_BLIST to include the additional cores, i.e. MP_BLIST=11,10,9,8,7,6,5,4,3,2,1,0.

For your edification, the optimal binding is very system specific. Different architectures will have different bindings, and different hardware vendors will order cores differently. Hence, users may need to do some research and experimentation to determine the best binding.

A useful utility is 'numactl' who's "--hardware" option will give details on which memory node is attached to which cores. "numactl" also allows you to bind to cores as well as memory nodes (MP_BIND will bind to the closest memory node).

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group