PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Reduction not recognized in Fortran
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Thu May 31, 2012 4:02 pm    Post subject: Reduction not recognized in Fortran Reply with quote

Hello,

This is my case, t_ptr is a pointer to pointer to a 3-D array (t_ptr => ptr => u1), and sclr is a scalar value. Following is my code snippet, which gives a "Segmentation Fault".
Code:

       #ifdef __PGI
       !$acc data region copy(t_ptr) copyin(sx, sy, sz, sclr)
       !$acc region do parallel
       #endif
       do i=1,1
          t_ptr(sx, sy, sz) = t_ptr(sx, sy, sz) + sclr
       enddo
       #ifdef __PGI
       !$acc end region
       !$acc end data region
       #endif


Following is the informational messages:

Code:

     13, PGI Unified Binary version for -tp=nehalem-64 -ta=nvidia
     32, Loop unrolled 3 times (completely unrolled)
     36, Generating copyin(sclr)
         Generating copyin(sz)
         Generating copyin(sy)
         Generating copyin(sx)
         Generating copy(ptr(:,:,:))
     38, Generating copy(t_ptr(sx,sy,sz))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     40, Complex loop carried dependence of 't_ptr' prevents parallelization
         Loop carried dependence due to exposed use of 't_ptr(sx,sy,sz)' prevents parallelization
         Accelerator kernel generated
         40, !$acc do seq
         CC 1.0 : 2 registers; 44 shared, 0 constant, 0 local memory bytes; 33% occupancy
         CC 2.0 : 4 registers; 0 shared, 60 constant, 0 local memory bytes; 16% occupancy


How can I solve this error?

Thanks
Sayan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Thu May 31, 2012 5:30 pm    Post subject: Reply with quote

Hi Sayan,

There a couple of issues here.

First, F90 pointers are yet supported within Accelerator regions. We need to aliasing issues before this can be added but our hope is that it's a solvable problem.

Second, you don't want to copy your scalars in the data region. This promotes them to global memory hurting performance. Instead, remove the copyin clause and let the compiler either privatize them or pass them in as arguments to the generated kernel.

Third, the loop is not parallel since every iteration of the loop updates the same element of t_ptr. Granted, the trip count is 1 so there's no parallelism to begin with, but the compiler's dependency analysis doesn't take the trip count into account when determining independence.

- Mat
Back to top
View user's profile
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Thu May 31, 2012 5:52 pm    Post subject: Reply with quote

Thank you for your reply. I guess the only way left would be to access the arrays directly, and let the compiler recognize the reduction.

Please bear with me, I am new in Fortran and after your reply I have a question w.r.t array processing. If there is an operation like:
Code:

A = A + B
or
A = A + (B*C)

where, A, B and C are arrays with same shape, then if such an operation
occurs within a compute region then would this operation be moved to the device?

Thank you
Sayan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Jun 01, 2012 8:04 am    Post subject: Reply with quote

Hi Sayan,

Yes, array syntax is supported. So the following would accelerate:

Code:
!$acc region
A = A + B
!$acc end region


Array syntax gets expanded by the compiler into an implied DO loop and then accelerated after the expansion.

- Mat
Back to top
View user's profile
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Fri Jun 01, 2012 8:50 am    Post subject: Reply with quote

Hello Mat,

Thank you once again. Referring to my original question, I have removed the pointers and I use an array, like this:

Code:

  !$acc region
  ....some other code
  ....some other code
  do k=k0,k1
   do j=j0,j1
      do i=i0,i1
          u(i,j,k)=u(i,j,k)+k
       enddo
      enddo
   enddo
  !$acc end region

I use the following compiler optimization options:
Code:

-Mpreprocess -fastsse -Mvect=noaltcode -Mipa=fast -mp=numa -ta=nvidia,host -Minfo=accel,loop,opt -Mneginfo

Is there a way that the compiler would be able to recognize the array reduction inside the acc region and make it parallel?

Thank you
Sayan
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group