PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Is it possible use 4 nested loops with OpenACC?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
rikisyo



Joined: 20 Jun 2013
Posts: 7

PostPosted: Fri Sep 13, 2013 7:53 am    Post subject: Is it possible use 4 nested loops with OpenACC? Reply with quote

I am trying to put 4 nested loops on GPU by OpenACC. Here is a simplified example:

Code:

Subroutine indexed_copy_4d( &
   arr_dst, arr_src, &
   i0,i1,is, j0,j1,js, k0,k1,ks, m0,m1,ms, &
   ki_dst, kj_dst, kk_dst, km_dst, kc_dst, &
   ki_src, kj_src, kk_src, km_src, kc_src )

Implicit None

Real, Intent(out), Dimension(1:) :: arr_dst
Real, Intent(in), Dimension(1:) :: arr_src

Integer, Intent(in) :: &
   i0,i1,is, j0,j1,js, k0,k1,ks, m0,m1,ms, &
   ki_dst, kj_dst, kk_dst, km_dst, kc_dst, &
   ki_src, kj_src, kk_src, km_src, kc_src

Integer :: i,j,k,m

!$acc kernels present(arr_dst,arr_src)
!$acc loop independent
do i=i0,i1,is
!$acc loop independent
do j=j0,j1,js
!$acc loop independent
do k=k0,k1,ks

   !$acc loop seq              ! $$$$
   do m=m0,m1,ms          ! $$$$

      arr_dst(ki_dst*i+kj_dst*j+kk_dst*k+kc_dst) = arr_src(ki_src*i+kj_src*j+kk_src*k+kc_src)

   enddo             ! $$$$

enddo
enddo
enddo
!$acc end kernels

End Subroutine indexed_copy_4d


Eventually m needs to be included in the calculated index, but that's irrelevant here. The problem is that compiler always fails due to internal error:

Code:
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unknown variable reference (nestedloop.f90: 23)
PGF90-S-0000-Internal compiler error. gen_aili: unrec. ili opcode:     345 (nestedloop.f90: 29)
pgf90-Fatal-/home/lluo6/pgi/linux86-64/13.8/bin/pgf902 TERMINATED by signal 11
Arguments to /home/lluo6/pgi/linux86-64/13.8/bin/pgf902
/home/lluo6/pgi/linux86-64/13.8/bin/pgf902 /tmp/pgf90RsHcbFp-ah1T.ilm -fn nestedloop.f90 -opt 2 -terse 1 -inform warn -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp istanbul -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 70 0x40000000 -x 124 1 -accel nvidia -accel host -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 189 8 -x 176 0x140000 -x 177 0x0202007f -x 176 0x100 -x 186 0x10000 -x 176 0x100 -x 186 0x20000 -x 176 0x100 -x 176 0x100 -x 189 4 -y 70 0x40000000 -cmdline '+pgf90 nestedloop.f90 -acc -c' -asm /tmp/pgf90ZsHczs4J7LZr.s


I tried using parallel construct, changing loop orders,... Always get internal error like above.

However, if I just remove all the lines marked with "! $$$$" - removing the internal loop, the compilation finishes without any problem.

It would be straightforward to implement equivalent code in CUDA, so I really don't know why a sequential loop inside a kernel thread would cause any trouble like this.

Comments are welcome.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Fri Sep 13, 2013 9:47 am    Post subject: Reply with quote

Hi rikisyo,

This is a compiler bug that looks like it started with release 13.3 when we increased the loop analysis level depth. The error is being caused by the skip count in the "m" loop, so the work around would be to remove ",ms".

I added TPR#19579 and sent it to engineering. Since we're in the late stages of 13.9 release testing, I doubt any fix will make it into 13.9. Possible, but more likely this would go into 13.10.

- Mat
Back to top
View user's profile
rikisyo



Joined: 20 Jun 2013
Posts: 7

PostPosted: Fri Sep 13, 2013 10:06 am    Post subject: Reply with quote

Problem solved.

Thank you!


mkcolg wrote:
Hi rikisyo,

This is a compiler bug that looks like it started with release 13.3 when we increased the loop analysis level depth. The error is being caused by the skip count in the "m" loop, so the work around would be to remove ",ms".

I added TPR#19579 and sent it to engineering. Since we're in the late stages of 13.9 release testing, I doubt any fix will make it into 13.9. Possible, but more likely this would go into 13.10.

- Mat
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 436

PostPosted: Fri Nov 01, 2013 2:11 pm    Post subject: TPR 19579 - OpenACC: internal seq loop with variable step ca Reply with quote

This has been fixed in the 13.10 release.

thanks,
dave
Back to top
View user's profile
rikisyo



Joined: 20 Jun 2013
Posts: 7

PostPosted: Thu Nov 07, 2013 7:07 am    Post subject: Re: TPR 19579 - OpenACC: internal seq loop with variable ste Reply with quote

Thanks for the update!

jtull wrote:
This has been fixed in the 13.10 release.

thanks,
dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group