PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

qacct and IB (mvapich)

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Sylvain K



Joined: 13 Oct 2009
Posts: 21

PostPosted: Mon Jun 04, 2012 1:53 pm    Post subject: qacct and IB (mvapich) Reply with quote

I discovered that the cpu usage (cpu = stime + utime) returned by qacct (accounting under rocks/SGE) is off for PGI/MPI jobs linked and run w/ IB (infiniband, mvapich).

I built and ran the same code and test case on our Rocks/SGE cluster w/ mpich and mvpich, respectively. I then qacct -j <JOBID> the two jobs and got utime of 0.622 vs 59.9 -- which does not make sense (both version ran OK).

Does anybody has a clue what is going on?

S.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Tue Jul 03, 2012 10:25 am    Post subject: Reply with quote

Hi S.

Sorry, but I have no idea. Though, this might be a question better address by your local IT support? Maybe the times are correct and the mvapich build is not configured correctly? Are the times repeatable? Do other programs show similar behavior?

- Mat
Back to top
View user's profile
Sylvain K



Joined: 13 Oct 2009
Posts: 21

PostPosted: Thu Jul 05, 2012 7:28 am    Post subject: Reply with quote

The times are dead wrong, and the problem is observed with more than one code - I've built and ran the High Performance Computing Linpack Benchmark (HPL) ver 2.0

I've tested several versions: compiled w/ gnu, intel and pgi compilers, w/out and w/ IB support (mpich or mvapich). In all cases, I used the vendor-provided libs (Intel Cluster Studio, PGI Cluster Dev Kit).

What I get is for the two test cases is:

Code:

job type        compiler  nCPUs  wallclock    utime      stime         cpu

1x16384-2x2     gnu          4   182.467    705.975     23.394     729.369 
1x16384-2x2     gnu+ib       4   183.783      0.004      0.002       0.006 
1x16384-2x2     gnu-v143     4   183.083    706.586     25.238     731.825 
1x16384-2x2     intel        4   217.317    861.706      6.939     868.645 
1x16384-2x2     intel+ib     4   216.783    859.793      7.002     866.795 
1x16384-2x2     pgi          4   391.117    200.263     62.671     262.934 
1x16384-2x2     pgi+ib       4   339.917      0.009      0.006       0.016 

1x16384-16x16   gnu        256   255.483   1049.666   5070.615    6120.280 
1x16384-16x16   gnu+ib     256   233.233      0.006      0.018       0.024 
1x16384-16x16   gnu-v143   256    11.450    175.701     88.706     264.407 
1x16384-16x16   intel      256   170.717  22349.205  21290.297   43639.502 
1x16384-16x16   intel+ib   256   555.383  99525.704  40159.900  139685.604 
1x16384-16x16   pgi        256   176.467     11.895     42.559      54.454 
1x16384-16x16   pgi+ib     256    14.783      0.016      0.045       0.060 


The pgi+ib stime, utime and cpu can't be right. and appears as off as the gnu+ib cases... The utime cannot change whether using IB or not, w/ wacko qacct results, it is almost impossible to measure the efficiency of using the IB

Any idea what may cause this?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group