PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Accelerator compiler bug with sequential rewriting matrices.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Ankhazam



Joined: 24 Aug 2010
Posts: 7

PostPosted: Mon Dec 20, 2010 1:19 am    Post subject: Accelerator compiler bug with sequential rewriting matrices. Reply with quote

Hi,
below code run on a GTX480 with CC30 results in an upredictable rewriting of values from copiedInArray from host memory to an local temporary array only on GPU.
The arrays are real*4 and have the same dimensions x=90, y=90, z=1500 (probably the z dimension is the matter here)

Code:

!$acc region
      do k=2,z
        do j=1,y
          do i=1,x
            localGPUArray(i,j,k) = copiedInArray(i,j,k)
          enddo
        end do
      end do
!$acc end region


It appears that the compiler divides the job in a weird matter between computation units on GPU (90x90x1499).

A fast fix to this problem, so that values in both arrays are the same on the same indexes was to make any of these loops sequential. However the compiler nor profiler have not shown any hint that without the !$acc do seq these calculations may work undesired.

Code:

!$acc region
      do k=2,z
        do j=1,y
!$acc do seq
          do i=1,x
            localArray(i,j,k) = copiedInArray(i,j,k)
          enddo
        end do
      end do
!$acc end region


If You know any better way to fill an local GPU array with host-uploaded data please let me know. I hope that You will be able to recreate this problem and address it with a fix :)

Regards,
Nicolas Dobski
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Dec 21, 2010 9:23 am    Post subject: Reply with quote

Hi Nicolas,

Quote:

It appears that the compiler divides the job in a weird matter between computation units on GPU (90x90x1499).
This makes sense given that your k loop starts at 2. The compiler will only allocate the minimum amount of space, hence in this case 1499. You can override this behavior using the copy and local clauses.


Can you post a reproducing example? Here's my attempt to recreate the issue, but my simple example works fine.

Code:
% cat copy3d.f90


program copy3d

real, allocatable, dimension(:,:,:) :: A,B
integer :: i,j,k
integer :: x,y,z

x=90
y=90
z=1500

allocate(A(x,y,z), B(x,y,z))

do i=1,x
  do j = 1,y
    do k=1, z
       A(i,j,k)=real(i*j)/real(k)
    enddo
  enddo
enddo


!$acc region
do k=2, z
  do j = 1,y
    do i=1,x
       B(i,j,k) = A(i,j,k)
    enddo
  enddo
enddo
!$acc end region

print *, A(1,1,2), A(1,1,1500)
print *, B(1,1,2), B(1,1,1500)

end program copy3d

% pgf90 copy3d.f90 -ta=nvidia -Minfo=accel -V10.9 ; a.out
copy3d:
     24, Generating copyin(a(1:90,1:90,2:1500))
         Generating copyout(b(1:90,1:90,2:1500))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     25, Loop is parallelizable
     26, Loop is parallelizable
     27, Loop is parallelizable
         Accelerator kernel generated
         25, !$acc do parallel, vector(4)
         26, !$acc do parallel, vector(4)
         27, !$acc do vector(16)
             CC 1.0 : 8 registers; 24 shared, 52 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 8 registers; 24 shared, 52 constant, 0 local memory bytes; 100 occupancy
   0.5000000       6.6666666E-04
   0.5000000       6.6666666E-04
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group