PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Different Performance by 13.xx ver.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
emath



Joined: 02 Jan 2010
Posts: 7

PostPosted: Mon Jun 10, 2013 3:53 am    Post subject: Different Performance by 13.xx ver. Reply with quote

Hello, I have developed an iterative solver for a large block linear system in Fortran code using openacc. The implementation with the latest versions of pgf90 compiler is significantly slower than the one using e.g. 12.7.

The main part of the code implemented in the GPU is

!$acc loop independent vector(32)
do k=1,n
k1=k*j
k2=(k+1)*j
!$acc loop independent vector(32)
do i=1,ik
t(k1+i)=t(k1+i)+x(k2+i)*a3(3,i)
enddo
......
enddo

The compilation options are -mp -acc -ta=nvidia,cc20 -Minfo -O4 -tp=nehalem-64 for an HP SL390 machine equipped with Tesla M2070 GPUs using Oraclelinux 6.2 operation system.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jun 10, 2013 10:17 am    Post subject: Reply with quote

Hi emath,

In 13.x, our engineers moved to using pinned memory by default in order to better support asynchronous data movement. While it improved many cases, we later found others where codes slowed down. The problem being that when pinned memory is deallocated, the device driver needs to synchronize the device and host to ensure all pending memory movement is complete. This can cause a slow down. Our engineers are revamping this behavior and hope to have an improved method soon. In the mean time you can try setting the environment variable "PGI_ACC_SYNCHRONOUS=1" partially revert to the old behavior.

Note the way to tell if this is indeed the problem with your program is to compare the device profile information between 12.7 and 13.6. Since the freeing of the pinned memory doesn't show up in the profile, if they profiles are about the same, then this is the problem.

- Mat
Back to top
View user's profile
emath



Joined: 02 Jan 2010
Posts: 7

PostPosted: Tue Jun 11, 2013 12:31 pm    Post subject: Reply with quote

Hi Mat,
You are right, the deallocation of pinned memory causes the problem in our implementation. Setting this env variable to 1 the performance between 12.xx and 13.xx PGI versions is about the same.
Thank you.
Manolis
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group