PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI 14.1 Fortran/acc bug report

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
BrentP



Joined: 08 Mar 2013
Posts: 6

PostPosted: Thu Mar 13, 2014 10:37 am    Post subject: PGI 14.1 Fortran/acc bug report Reply with quote

Hi,

I am working in the Virginia Tech AOE dept. where we are currently trying to apply OpenACC acceleration to some Fortran CFD codes. We have encountered what appears to be a bug in the PGI 14.1 Fortran/acc compiler in which incorrect CUDA code is generated for scalar kernels regions. I have constructed a simple example code that reproduces the error:

https://drive.google.com/file/d/0B1fu3KCwysj1TGJlNS0wZ2JMMEU/edit?usp=sharing

As evident in the example, the problem appears to be certain operations being incorrectly moved or optimized away. The effect is to turn a scalar kernel like this excerpt:

Code:

  !$acc kernels copy(soln(:,:,:),res(:,:,:))
  i=2
  j=2
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       !NOTE: R1 is re-computed for each i,j value

       local4 = soln(i,j,2)**2 + soln(i,j,3)**2 + add_val
       R1 = local4*5.9_dp

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3
 
  i=2
  j=y_nodes-1
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       local4 = soln(i,j,2)**2 + soln(i,j,3)**2 + add_val
       R1 = local4*5.9_dp

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3
 
  i=x_nodes-1
  j=2
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       local4 = soln(i,j,2)**2 + soln(i,j,3)**2 + add_val
       R1 = local4*5.9_dp

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3

       ...[etc]...


Into this:

Code:

  !$acc kernels copy(soln(:,:,:),res(:,:,:))
  i=2
  j=2
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       !NOTE: R1 is assigned only once and re-used, even as i,j change

       local4 = soln(i,j,2)**2 + soln(i,j,3)**2 + add_val   
       R1 = local4*5.9_dp   !computed only once for all i,j

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3
 
  i=2
  j=y_nodes-1
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3
 
  i=x_nodes-1
  j=2
       local1=soln(i,j,1) + soln(i+1,j+1,1) + soln(i-1,j-1,1) + &
                soln(i-1,j+1,1) + soln(i+1,j-1,1)
       local2=soln(i,j,2) + soln(i+1,j+1,2) + soln(i-1,j-1,2) + &
                soln(i-1,j+1,2) + soln(i+1,j-1,2)
       local3=soln(i,j,3) + soln(i+1,j+1,3) + soln(i-1,j-1,3) + &
                soln(i-1,j+1,3) + soln(i+1,j-1,3)

       res(i,j,1) = R1
       R2 = soln(i,j,2) + local2 + local3
       res(i,j,2) = R2
       R3 = soln(i,j,3) + local2 + local3
       res(i,j,3) = R3

       ...[etc]...


We have developed workarounds, but wanted to bring this possible bug to your attention.

Thank you,
Brent
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Thu Mar 13, 2014 1:31 pm    Post subject: Reply with quote

Hi Brent,

Thanks for the example. I'll pass it on to engineering (TPR#19997).

I agree that there's an over aggressive optimization being performed when using the "kernels". Though, this is a case where I'd normally use the "parallel" construct instead since "kernels" is more for tightly nested loops while "parallel" is more for cases where there's less structure like a sequential region. Granted, "kernels" shouldn't give you wrong answers, but changing it for "parallel" will work as expected.

In addition to using "parallel" for sequential regions, I'd also recommend you use a data region to span all of your compute regions. As written, you have a lot of extra data movement.

Here's a diff with my changes:
Code:
% diff -i scalar_test_ERROR.f90 scalar_test.f90
60c60,61
<   !$acc kernels copy(soln(:,:,:),res(:,:,:))
---
>   !$acc data copy(soln(:,:,:),res(:,:,:))
>   !$acc kernels
85c86
<   !$acc kernels copy(soln(:,:,:),res(:,:,:))
---
>   !$acc parallel
157c158
<   !$acc end kernels
---
>   !$acc end parallel
159c160
<   !$acc kernels copy(soln(:,:,:),res(:,:,:))
---
>   !$acc kernels
181c182
<
---
>   !$acc end data
% pgf90 -acc -Minfo=accel scalar_test.f90; a.out
kernel_test:
      0, Generating copy(soln(:,:,:))
         Generating copy(res(:,:,:))
     61, Generating Tesla code
     62, Loop is parallelizable
     63, Loop is parallelizable
         Accelerator kernel generated
         62, !$acc loop gang ! blockidx%y
         63, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     86, Accelerator kernel generated
         Generating Tesla code
    160, Generating Tesla code
    161, Loop is parallelizable
    162, Loop is parallelizable
         Accelerator kernel generated
        161, !$acc loop gang ! blockidx%y
        162, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
Corner (1,1) =                1.903
Corner (1,y_nodes) =                1.903
Corner (x_nodes,1) =               45.439
Corner (x_nodes,y_nodes) =               45.439


Hope this helps,
Mat
Back to top
View user's profile
BrentP



Joined: 08 Mar 2013
Posts: 6

PostPosted: Thu Mar 13, 2014 2:14 pm    Post subject: Reply with quote

Thanks for the info. I did not even think to try an "acc parallel" region for code that was supposed to execute as a serial kernel.

(In the real code, we do utilize a data region to avoid extra data movement--I was sloppy in this example.)

-Brent
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 438

PostPosted: Mon Jun 16, 2014 5:51 pm    Post subject: TPR 19997 has been fixed in the 14.6 release Reply with quote

TPR 19997 -" UF: Using "kernels" region over sequential code give wrong answers "

has been corrected in the current 14.6 release.

Thanks for the original report.

regards,
dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group