PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

No array assignment replaced by call to pgf90_mcopy4 in 10.2
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
bbierbaum



Joined: 19 Jan 2010
Posts: 3

PostPosted: Thu Mar 11, 2010 5:33 am    Post subject: Reply with quote

Short addendum: We've just installed 10.3 and I've used it to compile our original version and the version with the copy outside the compute region. The performance is no different than with 10.2.

Boris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Thu Mar 11, 2010 10:42 am    Post subject: Reply with quote

Hi Boris,

Can you please send the full source to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me? I'll need to see the full context to determine what's else could be going on.

Thanks,
Mat
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Fri Mar 12, 2010 1:01 am    Post subject: Reply with quote

Done.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Wed Mar 17, 2010 4:28 pm    Post subject: Reply with quote

Hi xray and Boris,

It appears to me that the reduction code for 'residual' is taking twice as long and is what is causing the slow down. I have sent a report (TPR#16728) to our engineers for further investigation.

Note that the 10.2 code to copy "uold = afU" does appear to be much faster than 10.1. I show a significant speed-up with 10.1 when I change the code to match what 10.2 does.:
Code:
!acc data region local(uold) copyin(afF) copy(afU)
            do while (iIterCount < iIterMax .and. residual > fTolerance)
                residual = 0.0d0

                ! Copy new solution into old
                !uold = afU
!$acc do parallel
                  ! Compute stencil, residual, & update
                   do j = 0, iRows
!$acc do vector(256)
                       do i = 0, iCols
                          uold(i,j) = afU(i,j)
                       enddo
                    enddo
...


Thanks,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Tue Aug 03, 2010 11:02 am    Post subject: Reply with quote

Hi xray and Boris,

Sorry for the late update on this one. In 10.4 we added back the relaxed divide when using the flag "-ta=nvidia,fastmath". Using this flag should regain the lost performance. By default, we decided to keep the slower but more accurate division.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group