|
| View previous topic :: View next topic |
| Author |
Message |
Sharp
Joined: 28 Aug 2008 Posts: 17
|
Posted: Thu Aug 28, 2008 10:38 am Post subject: Why large scale DGEMM parallelization appears strange? |
|
|
Hi, I am working on a program that using DGEMM for matrix multiplication. The compiler I am using is pgi 707/pgf77. In this program the subroutine DGEMM has been parallelized already:
| Code: | C$OMP Parallel
C$OMP Single
C$ NP=omp_get_num_threads()
C$ MinCoW=16
C$OMP End Single
C$OMP End Parallel
ColPW = Max((N+NP-1)/NP,MinCoW)
NWork = (N+ColPW-1)/ColPW [i]!...N is the number of column of C(M,N).[/i]
If(XStr2.eq.'T'.or.XStr2.eq.'C') then
IncB = 1
else
IncB = LDB
endIf
IncB = IncB*ColPW
IncC = ColPW*LDC
C$OMP Parallel Do Default(Shared) Schedule(Static,1) Private(IP,XN)
Do 100 IP = 0, (NWork-1)
XN = Min(N-IP*ColPW,ColPW)
Call DGEMM(XStr1,XStr2,XM,XN,XK,Alpha,A,XLDA,B(1+IP*IncB),
$ XLDB,Beta,C(1+IP*IncC),XLDC)
100 Continue |
The BLAS library I use for compiling this code is:
pgf77 -i8 '-mcmodel=medium' -mp -O2 -tp p7-64 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:2097152 -o xgemm.exe xgemm.o $gdvroot/bsd/libf77blas-em64t.a $gdvroot/bsd/libatlas-em64t.a -lpthread -lm -lc
Now the problem is:when I run the matrix multiplication jobs (the size of the matrices is 3432X3432) parallelized, upto 7 processors the speedup is perfect, but once the jobs are parallelized by 8 processors, the speedup becomes really poor (less than 3 times). However, when I change the size of the matrices, e.g. 924X924, the speedup for 8 processors becomes normal. I tried to assemble more memory for the 3432X3432 matrix multiplication of 8 processors, but it seems the speedup for a 10GB memory (the limit of our hardware) is still the same. Any one here can help me? Thank you very much!!! |
|
| Back to top |
|
 |
hongyon
Joined: 19 Jul 2004 Posts: 551
|
Posted: Thu Aug 28, 2008 10:44 am Post subject: |
|
|
Hi,
Did you try with our latest release? Can you please try and let us know if there is still a problem. There might be performance bug in our Openmp runtime that gets fixed in latest release.
Hongyon |
|
| Back to top |
|
 |
Sharp
Joined: 28 Aug 2008 Posts: 17
|
Posted: Fri Sep 05, 2008 4:06 am Post subject: |
|
|
Hi, thank you for your advice. Since our group doesn't have license of using the latest 7.2.x version, I tried the library of 7.1.6. It works alright now. Thank you.
Sharp |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|