PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

MPICH - parallel programs on dual Xeon / Opteron clusters

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
mkrech



Joined: 15 Oct 2004
Posts: 11

PostPosted: Mon Apr 10, 2006 6:51 am    Post subject: MPICH - parallel programs on dual Xeon / Opteron clusters Reply with quote

Dear forum,

I have a strange problem wiith parallel programs on clusters with two CPUs (Xeon
or Opteron) per node. My program contains a computational domain of 600 x 600
grid points with one double precision varable per lattice site. The program measures
the pure computation time and the pure communication time separately. When the
program is run on 2 CPUs, each CPU gets a 300 x 600 portion to handle and therefore
the pure computation time should be cut in half on each CPU. Here is the problem:
When the job is run on two CPUs on *different* cluster nodes, this is exactly what
happens. But when the job is run on *both* CPUs of *one* cluster node, the time
for computaion remains unchanged although the domain is cut in half on each CPU!
This happens for Xeon and Opteron for PGI 5.2 or higher (older versions were not
tested). A slight effect in this direction is also obervable for Gnu C, but it is by far not
as severe as for pgcc. The operating system is SuSE Linux 8.1 and higher. For PGI
the mpich version that came with the respective PGI-CDK was used, for Gnu C
mpich-1.2.5.2 was used. The elapsed time for each jobs was cross checked with
timers independent of the program, it always corresponds to computation time +
communication time. No mpich calls are issued in the computational part.

Any ideas?

Many thanks,
Michael
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Mon Apr 10, 2006 2:35 pm    Post subject: Reply with quote

Hi Michael,

It sounds like the processes are memory bound. How much computation is performed on each grid point? If your doing relatively few calculations per point, then each process will need fetch data from memory more often thus causing memory bus contention. If this is the case, you can try experimenting with prefetching, "-Mprefetch", and/or non-temporal stores, "-Mnontemporal", to see if you can alleviate some of the memory pressure.

Hope this helps,
Mat
Back to top
View user's profile
mkrech



Joined: 15 Oct 2004
Posts: 11

PostPosted: Tue Apr 11, 2006 1:33 am    Post subject: Reply with quote

Dear Mat,

The program is a simple minded iterative solver for the poisson equation in two
dimensions for performance test purposes. From the nature of the algorithm
the CPUs have to exchange data across boundaries after each iteration, so
very little computation is done per lattice point between mpich calls.

Neither -Mprefetch nor -Mnontemporal could alleviate the memory pressure.
This program is rather old and the only explanation I have for not realizing this
problem earlier is that I was doing my previous tests with the Gnu compiler which
only shows a slight indication of the problem. I also thought that in a NUMA
architecture each processor has his memory share attached to it and that memory
access of one CPU to its memory share does not disturb memory access of the
other CPU to its share. Why does this happen with PGI but -essentially- not with
Gnu also on dual Opteron nodes (where the hypertransport architecture accelerates
'crosswise' access of the CPUs to the memory)? The OpenMP version of my
program works nicely on dual Opterons and shows the typical bus contention
on dual Xeons when the cache size is exceeded.

Still puzzled,
Michael
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Tue Apr 11, 2006 7:09 pm    Post subject: Reply with quote

Hi Michael,

I do think it's memory issue but don't really know why it doesn't occur with GCC. However, you might be on to something. NUMA is not "on" unless you link in the NUMA libraries or use the utility "numactl". Try linking with "-mp" which will link in the NUMA libraries. I don't know if this will help, but since your OpenMP version works as expected, it's worth a try.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group