PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Automatic kernel producing out of bounds reads

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
PaulDellar



Joined: 13 Apr 2013
Posts: 5

PostPosted: Sun May 12, 2013 2:06 pm    Post subject: Automatic kernel producing out of bounds reads Reply with quote

The following generates a stream of out of bounds global reads according to cuda-memcheck when compiled with pgf95 versions 12.5 and 13.3. It works with m=128 and/or using mod instead of iand.

Looking at the .gpu files, the combination of m=256 and iand instead of mod produces a very different kernel with an extra argument, compared with the other three combinations.

implicit none
integer, parameter :: m=256 ! try 128 instead
integer, parameter :: n=32
real*8, device, dimension(0:m-1,n,n) :: f
real*8, dimension(0:m-1,n,n) :: fhost
integer shift,i,ip
shift=m
f = 1d0
!$cuf kernel do(1) <<< (*), (*) >>>
do i=0,m-1
ip = iand(i+shift,m-1)
! ip = mod(i+shift,m) ! try this instead
f(ip,n,n) = 0.5d0*f(ip,n,n)
enddo
fhost = f
write (6,*) "returned value ",sum(fhost)
end
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Mon May 13, 2013 9:59 am    Post subject: Reply with quote

Hi Paul,

Thanks for the report. I added a problem report (TPR#19339) and sent it on to engineer. It's interesting that the error only occur at 256. Change this to 255 or 257, then it's fine. Also, the code runs to completion if I use OpenACC instead, but it doesn't look like it correct answers. Again, it's fine when m is not 256. Interesting case.

- Mat
Back to top
View user's profile
PaulDellar



Joined: 13 Apr 2013
Posts: 5

PostPosted: Tue May 14, 2013 3:50 pm    Post subject: Reply with quote

Dear Mat,

Thanks for confirming, and for filing a problem report. I'm not very familiar with OpenACC, but according to -Minfo the 13.3 compiler generates a scalar kernel because it can't establish that the loop iterations are independent.

Looking at the .gpu files, __launch_bounds__ is set to 1 in the kernel generated using OpenACC, but to 128 in the parallel kernel generated by a !$cuf directive. That may be why my example works with OpenACC, but presumably doesn't run very fast.

Paul
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue May 14, 2013 4:59 pm    Post subject: Reply with quote

Quote:
Thanks for confirming, and for filing a problem report.
You're welcome.

Quote:
OpenACC, but according to -Minfo the 13.3 compiler generates a scalar kernel because it can't establish that the loop iterations are independent.
Yes, however you can add the "independent" clause to tell the compiler that it is independent.

For example:
Code:
% cat out.f90
implicit none
integer, parameter :: m=256 ! try 128 instead
integer, parameter :: n=32
#ifdef _CUDA
real*8, device, dimension(0:m-1,n,n) :: f
#else
real*8, dimension(0:m-1,n,n) :: f
#endif
real*8, dimension(0:m-1,n,n) :: fhost
integer shift,i,ip,ierr
shift=m
f = 1d0
#ifdef _OPENACC
!$acc kernels loop independent
#else
!$cuf kernel do(1) <<< (*), (*) >>>
#endif
do i=0,m-1
ip = iand(i+shift,m-1)
! ip = mod(i+shift,m) ! try this instead
f(ip,n,n) = 0.5d0*f(ip,n,n)
enddo
fhost = f
write (6,*) "returned value ",sum(fhost)
end
% pgf90 out.f90 -Mpreprocess -Minfo -Mcuda -acc ; a.out
MAIN:
18, Loop is parallelizable
Accelerator kernel generated
18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
24, sum reduction inlined
returned value 262144.0000000000


- Mat
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 445

PostPosted: Fri Jun 07, 2013 3:14 pm    Post subject: TPR 19339 is fixed in 13.6 Reply with quote

TPR 19339 - CUF: user example code gets runtime error when using "ishift"

has been fixed in the now available 13.6 release.

regards,
dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group