PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

acc routine and Fortran
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 340
Location: Greenbelt, MD

PostPosted: Mon Mar 10, 2014 5:19 am    Post subject: acc routine and Fortran Reply with quote

All,

I'm hoping someone here can help me with this. Long ago, I used the PGI Accelerator directives, but limitations with those led me to CUDA Fortran. But now I'm trying to venture back into the brave new world of OpenACC. Well, OpenACC 2.0 because my simplest accelerator kernel has subroutine calls within. Thus, I need !$acc routine. My main question, though, is how exactly do you use it?

I've tried searching around the web for 'acc routine' and I see quite a few examples in C, but I've only ever seen one for Fortran at this page. (And since that has a subroutine call that has a brace at the end:
Code:
subroutine foo(v, i, n) {
and isn't even valid Fortran (anyone see where "j" is declared?) I'm not too confident of it.) Still, it's an example.

So, my code looks something like:

Code:
module soradmod
...
contains
subroutine sorad(...)
...
   call deledd(...)
   call deledd(...)
...
end subroutine sorad

subroutine deledd(...)
...
end subroutine deledd

end module soradmod

Now, it's much more complex, and in truth there are subroutine calls to subroutines *external* to soradmod, but for now, let's deal with deledd.

So, after adding some !$acc kernels, a few !$acc loop private to deal with some -Minfo messages, I get:
Code:
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages  (src/sorad.acc.F90: 327)
sorad:
    327, Accelerator region ignored
    341, Loop not vectorized/parallelized: too deeply nested
    362, Loop not vectorized: data dependency
    387, Loop unrolled 4 times (completely unrolled)
    396, Memory zero idiom, loop replaced by call to __c_mzero4
    397, Memory zero idiom, loop replaced by call to __c_mzero4
    398, Memory zero idiom, loop replaced by call to __c_mzero4
    399, Memory zero idiom, loop replaced by call to __c_mzero4
    400, Memory zero idiom, loop replaced by call to __c_mzero4
    402, Memory zero idiom, loop replaced by call to __c_mzero4
    403, Memory zero idiom, loop replaced by call to __c_mzero4
    405, Memory zero idiom, loop replaced by call to __c_mzero4
    406, Memory zero idiom, loop replaced by call to __c_mzero4
    407, Memory zero idiom, loop replaced by call to __c_mzero4
    413, Loop not fused: different loop trip count
         Loop not vectorized: may not be beneficial
    423, Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
         Loop unrolled 8 times (completely unrolled)
    505, Loop not fused: different controlling conditions
    518, Generated 4 alternate versions of the loop
         Generated vector sse code for the loop
         Generated 8 prefetch instructions for the loop
    519, Loop unrolled 4 times (completely unrolled)
    531, Loop not vectorized/parallelized: too deeply nested
    538, Accelerator restriction: function/procedure calls are not supported
         Loop not vectorized/parallelized: contains call
    558, Accelerator restriction: unsupported call to 'deledd'
...

And, of course, it sees the deledd call. So, I then try, a la the link above:
Code:
subroutine deledd(...)
!$acc routine
...
end subroutine deledd

and:
Code:
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd

Hmm. I also try:
Code:
!$acc routine vector
!$acc routine worker
!$acc routine gang

but each one gives me the same error. I've tried putting the !$acc statements above the subroutine declaration, no go. I've tried:
Code:
!$acc routine(deledd)
in various places, no go.

Any help? I'm hoping if I can figure this out, I can then try and figure out how to then use routines that are in different files. (Heck, I can't even get -Mextract/-Minline to work, so !$acc routine across different files is daunting!)

Thanks,
Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6741
Location: The Portland Group Inc.

PostPosted: Mon Mar 10, 2014 4:53 pm    Post subject: Reply with quote

Hi Matt,

Thanks for pointing out the typo with the "{" on ParallelForAll. I'll let Jeff know. Though, "j" doesn't need to be declared due to implicit typing.

As for routine, first make sure you have PGI 14.1 or later. OpenACC "routine" directive support for subroutines was added then. Function support was added in 14.2. From what I can tell, it appears that you're using the directive correctly but may just be using 13.10.

Here's a very simple example.

Code:
% cat testr.f90
module testme
    integer, parameter :: N = 16
contains
subroutine testit
    real*4 :: a0(N), b0(N), b1(N)
    integer :: acc(6), exp(6)
    integer :: i
    do i = 1, N
       a0(i) = real(i) * 10.0
       b0(i) = -1.0
       b1(i) = -2.0
    enddo

    do i=1,N
       call doit(b1,a0,i)
    enddo

    !$acc parallel
    !$acc loop
    do i = 1, N
       call doit( b0, a0, i )
    enddo
    !$acc end parallel
    do i = 1, N
       print *, b0(i), b1(i)
    enddo

end

subroutine doit( b, a, i)
!$acc routine vector
    real*4 :: b(*), a(*)
    integer :: i
    b(i) = a(i)*a(i)
end
end module testme

program main()
    use openacc
    use testme
    call testit()
end
sbe02:/local/home/colgrove% pgf90 testr.f90 -V14.1 -acc -Minfo=accel; a.out
testit:
     18, Accelerator kernel generated
         20, !$acc loop gang ! blockidx%x
     18, Generating copy(a0(:))
         Generating copy(b0(:))
         Generating NVIDIA code
doit:
     30, Generating acc routine vector
         Generating NVIDIA code
    100.0000        100.0000
    400.0000        400.0000
    900.0000        900.0000
    1600.000        1600.000
    2500.000        2500.000
    3600.000        3600.000
    4900.000        4900.000
    6400.000        6400.000
    8100.000        8100.000
    10000.00        10000.00
    12100.00        12100.00
    14400.00        14400.00
    16900.00        16900.00
    19600.00        19600.00
    22500.00        22500.00
    25600.00        25600.00
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 340
Location: Greenbelt, MD

PostPosted: Tue Mar 11, 2014 4:43 am    Post subject: Reply with quote

Quote:
Thanks for pointing out the typo with the "{" on ParallelForAll. I'll let Jeff know. Though, "j" doesn't need to be declared due to implicit typing.
Oh yeah, implicit typing...I always forget you can do that. Mainly because I was taught small puppies are sad when you don't use "implicit none" in Fortran or "default(none)" in OpenMP.

But, as we'll see soon, this matters!

Quote:
As for routine, first make sure you have PGI 14.1 or later. OpenACC "routine" directive support for subroutines was added then. Function support was added in 14.2. From what I can tell, it appears that you're using the directive correctly but may just be using 13.10.

Oh, I'm using PGI 14.1, but just to be sure, here's without !$acc routine:
Code:
$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages  (src/sorad.acc.F90: 327)
sorad:
    327, Accelerator region ignored
...
    538, Accelerator restriction: function/procedure calls are not supported
         Loop not vectorized/parallelized: contains call
    558, Accelerator restriction: unsupported call to 'deledd'
...

and now with:
Code:
$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd
make[1]: *** [sorad.acc.o] Error 2

Or if I boil it down to what your example uses:
Code:
$ pgfortran -Minfo=all -V14.1 -acc -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd

PGI 14.2 also leads to the same error. (Though I've had to pretty much stop using PGI 14.2 due to its apparent fragility when it comes to linking as seen here.)

That said, here's something that I noticed. Let's make a couple versions of your testr.f90 code with one added line (disregarding spaces):

Code:
$ diff -u testr.f90 testr_in.f90
--- testr.f90   2014-03-11 07:04:45.393678000 -0400
+++ testr_in.f90   2014-03-11 07:18:52.267942000 -0400
@@ -29,6 +29,9 @@
 
 subroutine doit( b, a, i)
 !$acc routine vector
+
+    implicit none
+
     real*4 :: b(*), a(*)
     integer :: i
     b(i) = a(i)*a(i)
$ pgfortran -V14.1 -acc -Minfo=accel testr_in.f90
PGF90-S-0070-Incorrect sequence of statements  (testr_in.f90: 33)
  0 inform,   0 warnings,   1 severes, 0 fatal for doit
$ diff -u testr.f90 testr_in2.f90
--- testr.f90   2014-03-11 07:04:45.393678000 -0400
+++ testr_in2.f90   2014-03-11 07:21:12.545263000 -0400
@@ -28,7 +28,11 @@
 end
 
 subroutine doit( b, a, i)
+
+    implicit none
+
 !$acc routine vector
+
     real*4 :: b(*), a(*)
     integer :: i
     b(i) = a(i)*a(i)
$ pgfortran -V14.1 -acc -Minfo=accel testr_in2.f90
testit:
     18, Accelerator kernel generated
         20, !$acc loop gang ! blockidx%x
     18, Generating copy(a0(:))
         Generating copy(b0(:))
         Generating NVIDIA code
doit:
     30, Generating acc routine vector
         Generating NVIDIA code
$ ./a.out
    100.0000        100.0000   
    400.0000        400.0000   
    900.0000        900.0000   
    1600.000        1600.000   
    2500.000        2500.000   
    3600.000        3600.000   
    4900.000        4900.000   
    6400.000        6400.000   
    8100.000        8100.000   
    10000.00        10000.00   
    12100.00        12100.00   
    14400.00        14400.00   
    16900.00        16900.00   
    19600.00        19600.00   
    22500.00        22500.00   
    25600.00        25600.00

So, it looks like !$acc routine must come after implicit none and not before.

I've pored over the OpenACC 2.0a API standard and I don't see anything about order of "implicit none", but there are a lot of "implicit" in the document. Is there something I'm violating?

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6741
Location: The Portland Group Inc.

PostPosted: Tue Mar 11, 2014 10:07 am    Post subject: Reply with quote

Quote:
I've pored over the OpenACC 2.0a API standard and I don't see anything about order of "implicit none", but there are a lot of "implicit" in the document. Is there something I'm violating?
I hadn't encountered this before (routine is new for me too) but will ask our compiler folks about it. The OpenACC standard just states that it needs to be in the specification part, but since "implicit none" is part of the specification, it seems to allow it.

What I don't know is if the authors of the standard didn't account for "Use" and "implicit" and meant to say before the definition part, or if PGI is being too strict.

In any event, we do need to have better documentation for "routine" as more folks begin to use it.

- Mat
Back to top
View user's profile
KarlWilkinson85254



Joined: 17 Jan 2013
Posts: 9

PostPosted: Wed Mar 12, 2014 10:28 am    Post subject: Reply with quote

Hiya,

I think I may just found a related issue, if it will help TheMatt, I don't know!

The following will only work if the comments (!XXX) are removed:

Code:
!XXXmodule gpu_subs
subroutine simple
!$ACC ROUTINE
  stuff
end subroutine simple
!XXXend module gpu_subs

subroutine complicated
!XXXuse module gpu_subs
stuff
!$ACC LOOP INDEPENDENT
do i = small, big
    call simple
end do
stuff
end subroutine complicated


The use, or non-use, of "implicit none" and "use module xyz" within simple have no effect on the success of the compilation.

I can understand why this might be expected, but thought it may help someone.

Cheers,

Karl
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group