PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Accelerator restriction: unsupported call to ...
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Sun Jan 27, 2013 12:39 am    Post subject: Accelerator restriction: unsupported call to ... Reply with quote

In the code that follows, I have a set of nested loops. The outer loop is the one I'd like to parallelize on the GPU with OpenACC. The inner loop should be run sequentially by each individual thread.

When I try to compile this code using
Code:
pgfortran -o inf_while_test.exe -acc -ta=nvidia -Minfo=accel -Minline=levels:10 inf_while_test.f90

I get the following output:
Quote:
PGF90-W-0155-Accelerator region ignored; see -Minfo messages (inf_while_test.f90: 63)
inf_while_test:
63, Accelerator region ignored
65, Accelerator restriction: function/procedure calls are not supported
71, Accelerator restriction: unsupported call to 'kiss'


The line number being flagged, 71, isn't even the first occurrence of a call to kiss in the program. What is going on here, and what am I doing incorrectly?

OpenACC will allow do while(true) loops; I have tested this. There's apparently something about this more complicated structure that's throwing the compiler for a loop, so to speak.

Code follows:
Code:
!--------------------------------------------------------------------
module marsaglia
implicit none
private
public :: kiss, kisset
   INTEGER :: x=123456789, y=362436069, z=521288629, w=916191069
contains
   subroutine kiss(rand_out)
      integer :: i
      real :: rand_out

! The  KISS (Keep It Simple Stupid) random number generator.
! http://www.fortran.com/kiss.f90 . Slightly modified.

   do while(.true.)
      x = 69069 * x + 1327217885
      y = m (m (m (y, 13), - 17), 5)
      z = 18000 * iand (z, 65535) + ishft (z, - 16)
      w = 30903 * iand (w, 65535) + ishft (w, - 16)
      i = x + y + ishft (z, 16) + w
     
      rand_out = i*2.33e-10 + 0.5
      if((rand_out .gt. 0.) .and. (rand_out .lt. 1.)) return
   enddo
   
   contains
      function m(k, n)
         integer :: m, k, n
         m = ieor (k, ishft (k, n) )
      end function m
   end subroutine
   
   function kisset (ix, iy, iz, iw)
      integer :: kisset, ix, iy, iz, iw
      x = ix
      y = iy
      z = iz
      w = iw
      kisset = 1
   end function kisset
end module marsaglia
!--------------------------------------------------------------------

!=====================
program inf_while_test

use marsaglia

implicit none

integer :: i, imax, iseed
integer, dimension(8) :: time_array
real :: temp, rand, start_time, end_time, ran4

iseed = -2255
i = kisset(iseed, 2*iseed, 3*iseed, 4*iseed)
imax = 20000

call date_and_time(values=time_array)
start_time = time_array(5)*3600 + time_array(6)*60 +   &!&
             time_array(7)      + 0.001*time_array(8)

!$acc kernels loop private(i)
outer_loop: do i = 1, imax
  inner_loop: do while(.true.)
    call kiss(rand)
    if(rand .gt. 0.99) then
      exit inner_loop
    endif
   
    call kiss(rand)
    if(rand .lt. 0.9) then
      temp = rand
    else
      call kiss(rand)
      temp = rand * 3.14159d0
    endif
  enddo inner_loop
enddo outer_loop
!$acc end kernels

call date_and_time(values=time_array)
end_time = time_array(5)*3600 + time_array(6)*60 +   &!&
           time_array(7)      + 0.001*time_array(8)

print *, "time = ",end_time - start_time

end program inf_while_test
!=========================
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jan 28, 2013 12:48 pm    Post subject: Reply with quote

Hi dcwarren,

For many reasons, but mostly due to the lack of a linker for device code, OpenACC does not allow for the calling of routines from within a compute region. While this is changing with the addition of the proposed OpenACC 2.0 standard's "routine" directive, (See: http://www.openacc.org/sites/default/files/Proposed%20Additions%20for%20OpenACC%202.pdf), until this has been implemented, all routines must be inlined, either implicitly by the compiler or explicitly by the user.

In this case, if you change your code so that "m" is not a contained function, "kiss" can be automatically inlined by the compiler by adding the "-Minline" flag. (routines that have contained functions can't be automatically inlined)

However, your code wont accelerate due to the loop dependency on the w,x,y, and z variables. You can work around this by making these variables public in your module and then using the "private" clause to make it so that each thread has it's own copy of the variables. The problem with this is that each thread will have the same initial seed and therefore generate the same random values. Hence, the better solution is to make w, x, y, and z arrays, one for each iteration of the "i" loop, and then randomly generate and array of seeds.

Hope this helps,
Mat
Back to top
View user's profile
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Mon Jan 28, 2013 3:03 pm    Post subject: Reply with quote

Thanks for the response. I knew about the necessity of inlining, but not about contained functions. Where could I have found that information?

As to generating the random seed, I changed kisset to a subroutine and called kisset at the top of each iteration of outer_loop as follows:
Code:
call kisset(iseed + i, 2*iseed + i, 3*iseed + i, 4*iseed + i)

That should give each iteration of outer_loop its own unique and repeatable seed for generating random numbers, without the need for an array of seeds.

When I tried to compile the code with the above changes, I got a bunch of info messages about live-out variables. The compiler did seem to generate code that would run, albeit more slowly than the CPU version.

Listing w, x, y, and z as private variables in the opening !$acc statement got me four messages about variables not being explicitly declared. All four of those are non-private variables in the module marsaglia, which is being used by the main program. Removing the references to public and private from the module made things work (and made the accelerated code 10x faster than the CPU code!), so I think that's an issue with my Fortran knowledge rather than an accelerator problem.

Now when I tried to compile I got 12 references to loop-carried scalar dependences, one for each of w/x/y/z for each of the three calls to kiss(). I don't like avoidable warning messages, so I tried telling the accelerator to run the do loop in kiss() sequentially by adding the line !$acc loop seq right above the do while(.true.) line in kiss(). This broke the code.

Below are the current state of the code and the error messages generated.
Code:
!--------------------------------------------------------------------
module marsaglia
implicit none
   INTEGER :: x=123456789, y=362436069, z=521288629, w=916191069
contains
   subroutine kiss(rand_out)
      integer :: i
      real :: rand_out

! The  KISS (Keep It Simple Stupid) random number generator.
! http://www.fortran.com/kiss.f90 . Slightly modified.

   !$acc loop seq
   do while(.true.)
      x = 69069 * x + 1327217885
      y = m (m (m (y, 13), - 17), 5)
      z = 18000 * iand (z, 65535) + ishft (z, - 16)
      w = 30903 * iand (w, 65535) + ishft (w, - 16)
      i = x + y + ishft (z, 16) + w
     
      rand_out = i*2.33e-10 + 0.5
      if((rand_out .gt. 0.) .and. (rand_out .lt. 1.)) return
   enddo
   
   end subroutine
   
   function m(k, n)
      integer :: m, k, n
      m = ieor (k, ishft (k, n) )
   end function m
   
   subroutine kisset (ix, iy, iz, iw)
      integer :: ix, iy, iz, iw
      x = ix
      y = iy
      z = iz
      w = iw
   end subroutine kisset
end module marsaglia
!--------------------------------------------------------------------

!=====================
program inf_while_test

use marsaglia

implicit none

integer :: i, imax, iseed
integer, dimension(8) :: time_array
real :: temp, rand, start_time, end_time

iseed = -2255
imax = 20000

call date_and_time(values=time_array)
start_time = time_array(5)*3600 + time_array(6)*60 +   &!&
             time_array(7)      + 0.001*time_array(8)

!$acc kernels loop private(i,w,x,y,z,rand,temp)
outer_loop: do i = 1, imax
  call kisset(iseed+i, 2*iseed+i, 3*iseed+i, 4*iseed+i)
  inner_loop: do while(.true.)
    call kiss(rand)
    if(rand .gt. 0.99) then
      exit inner_loop
    endif
   
    call kiss(rand)
    if(rand .lt. 0.9) then
      temp = rand
    else
      call kiss(rand)
      temp = rand * 3.14159d0
    endif
  enddo inner_loop
enddo outer_loop
!$acc end kernels

call date_and_time(values=time_array)
end_time = time_array(5)*3600 + time_array(6)*60 +   &!&
           time_array(7)      + 0.001*time_array(8)

print *, "time = ",end_time - start_time

end program inf_while_test
!=========================

Code:
PGF90-S-0155-DO loop expected after ??? (INF_WHILE_TEST.f90: 15)
PGF90-S-0104-Illegal control structure - unterminated ACC LOOP directive (INF_WHILE_TEST.f90: 13)
  0 inform,   0 warnings,   2 severes, 0 fatal for kiss


I have a few questions about the program at this point.
  • First, why is the compiler complaining about unterminated ACC LOOP directives? According to the quick reference manual the loop clause does not have an associated "end" statement. (In fact, on page 12 of the PGI OpenACC Getting Started Guide there's an !$acc kernels loop statement without an ending.)
  • Second, what changes would I need to make to either the code or (more likely) the OpenACC statements to let the compiler know that I know the do loop in kiss() should be run sequentially?
  • Third, what would I need to change to eliminate those warning statements about loop carried scalar dependences? In the actual production code there would be so many of these the actual compiler information I want to see would be drowned out.
  • Alternately, should I just give up with these warning messages and accept that they are good and unavoidable?

Thanks for any information you can provide.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jan 28, 2013 4:58 pm    Post subject: Reply with quote

Quote:
First, why is the compiler complaining about unterminated ACC LOOP directives?
Because our have an "loop" directive that's not contained within a kernel region. It gets inlined into one, but needs to stand-one since the routine may only be inlined in some cases. Plus a "while" loop can't be accelerated since the number iterations isn't known at the time the loop begins.

Quote:
According to the quick reference manual the loop clause does not have an associated "end" statement. (In fact, on page 12 of the PGI OpenACC Getting Started Guide there's an !$acc kernels loop statement without an ending.)
Correct, but it does need to be within a "kernel" or "parallel" region.

Quote:
Second, what changes would I need to make to either the code or (more likely) the OpenACC statements to let the compiler know that I know the do loop in kiss() should be run sequentially?
Since it's not parallel, the compiler has no choice but the run it sequentially. No loop schedule is needed and you can just remove it.

Quote:
Third, what would I need to change to eliminate those warning statements about loop carried scalar dependences? In the actual production code there would be so many of these the actual compiler information I want to see would be drowned out.
I've complained about the excessive informational messages as well but the analysis is done before the kernels are generated, so the compiler engineers needed to keep it this way. Though, from the output, it seems to be doing the correct thing:

Code:
% pgf90 -acc -Minline -Minfo=acc kiss.f90
inf_while_test:
     59, Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     60, Loop is parallelizable
         Accelerator kernel generated
         60, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
          63, Loop carried scalar dependence for 'x' at line 63
              Loop carried scalar dependence for 'y' at line 63
              Loop carried scalar dependence for 'z' at line 63
              Loop carried scalar dependence for 'w' at line 63
              Scalar last value needed after loop for 'rand' at line 64
              Inner sequential loop scheduled on accelerator
          68, Loop carried scalar dependence for 'x' at line 68
              Loop carried scalar dependence for 'y' at line 68
              Loop carried scalar dependence for 'z' at line 68
              Loop carried scalar dependence for 'w' at line 68
              Scalar last value needed after loop for 'rand' at line 69
              Inner sequential loop scheduled on accelerator
          72, Loop carried scalar dependence for 'x' at line 72
              Loop carried scalar dependence for 'y' at line 72
              Loop carried scalar dependence for 'z' at line 72
              Loop carried scalar dependence for 'w' at line 72
              Inner sequential loop scheduled on accelerator



Quote:
Alternately, should I just give up with these warning messages and accept that they are good and unavoidable?
I have.

- Mat
Back to top
View user's profile
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Tue Jan 29, 2013 8:32 am    Post subject: Reply with quote

Making those changes (and giving up the fight against warning messages) makes that little test code run spectacularly. Thanks for the insights.

However, when I apply those lessons to the actual production code, I still get errors about unsupported calls to certain subroutines. Given what you mentioned before, I think the issue is inability to inline said subroutines. In the PGI Fortran Compiler Manual, there are a few reasons given why subprograms wouldn't be inlined:
Quote:
A Fortran subprogram is not inlined if any of the following applies:
  • It is referenced in a statement function.
  • A common block mismatch exists; in other words, the caller must contain all common blocks specified in the callee, and elements of the common blocks must agree in name, order, and type (except that the caller's common block can have additional members appended to the end of the common block).
  • An argument mismatch exists; in other words, the number and type (size) of actual and formal parameters must be equal.
  • A name clash exists, such as a call to subroutine xyz in the extracted subprogram and a variable named xyz in the caller

I notice that there's nothing mentioned in here about "contains" statements, and I don't believe any of these four restrictions applies to my code. Is there a larger list of these restrictions that has yet to be published, and would you share it with me if so?

Thanks in advance.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group