|
| View previous topic :: View next topic |
| Author |
Message |
dcwarren
Joined: 18 Jun 2012 Posts: 29
|
Posted: Sun Jan 27, 2013 12:39 am Post subject: Accelerator restriction: unsupported call to ... |
|
|
In the code that follows, I have a set of nested loops. The outer loop is the one I'd like to parallelize on the GPU with OpenACC. The inner loop should be run sequentially by each individual thread.
When I try to compile this code using
| Code: | | pgfortran -o inf_while_test.exe -acc -ta=nvidia -Minfo=accel -Minline=levels:10 inf_while_test.f90 |
I get the following output:
| Quote: | PGF90-W-0155-Accelerator region ignored; see -Minfo messages (inf_while_test.f90: 63)
inf_while_test:
63, Accelerator region ignored
65, Accelerator restriction: function/procedure calls are not supported
71, Accelerator restriction: unsupported call to 'kiss' |
The line number being flagged, 71, isn't even the first occurrence of a call to kiss in the program. What is going on here, and what am I doing incorrectly?
OpenACC will allow do while(true) loops; I have tested this. There's apparently something about this more complicated structure that's throwing the compiler for a loop, so to speak.
Code follows:
| Code: | !--------------------------------------------------------------------
module marsaglia
implicit none
private
public :: kiss, kisset
INTEGER :: x=123456789, y=362436069, z=521288629, w=916191069
contains
subroutine kiss(rand_out)
integer :: i
real :: rand_out
! The KISS (Keep It Simple Stupid) random number generator.
! http://www.fortran.com/kiss.f90 . Slightly modified.
do while(.true.)
x = 69069 * x + 1327217885
y = m (m (m (y, 13), - 17), 5)
z = 18000 * iand (z, 65535) + ishft (z, - 16)
w = 30903 * iand (w, 65535) + ishft (w, - 16)
i = x + y + ishft (z, 16) + w
rand_out = i*2.33e-10 + 0.5
if((rand_out .gt. 0.) .and. (rand_out .lt. 1.)) return
enddo
contains
function m(k, n)
integer :: m, k, n
m = ieor (k, ishft (k, n) )
end function m
end subroutine
function kisset (ix, iy, iz, iw)
integer :: kisset, ix, iy, iz, iw
x = ix
y = iy
z = iz
w = iw
kisset = 1
end function kisset
end module marsaglia
!--------------------------------------------------------------------
!=====================
program inf_while_test
use marsaglia
implicit none
integer :: i, imax, iseed
integer, dimension(8) :: time_array
real :: temp, rand, start_time, end_time, ran4
iseed = -2255
i = kisset(iseed, 2*iseed, 3*iseed, 4*iseed)
imax = 20000
call date_and_time(values=time_array)
start_time = time_array(5)*3600 + time_array(6)*60 + &!&
time_array(7) + 0.001*time_array(8)
!$acc kernels loop private(i)
outer_loop: do i = 1, imax
inner_loop: do while(.true.)
call kiss(rand)
if(rand .gt. 0.99) then
exit inner_loop
endif
call kiss(rand)
if(rand .lt. 0.9) then
temp = rand
else
call kiss(rand)
temp = rand * 3.14159d0
endif
enddo inner_loop
enddo outer_loop
!$acc end kernels
call date_and_time(values=time_array)
end_time = time_array(5)*3600 + time_array(6)*60 + &!&
time_array(7) + 0.001*time_array(8)
print *, "time = ",end_time - start_time
end program inf_while_test
!========================= |
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Jan 28, 2013 12:48 pm Post subject: |
|
|
Hi dcwarren,
For many reasons, but mostly due to the lack of a linker for device code, OpenACC does not allow for the calling of routines from within a compute region. While this is changing with the addition of the proposed OpenACC 2.0 standard's "routine" directive, (See: http://www.openacc.org/sites/default/files/Proposed%20Additions%20for%20OpenACC%202.pdf), until this has been implemented, all routines must be inlined, either implicitly by the compiler or explicitly by the user.
In this case, if you change your code so that "m" is not a contained function, "kiss" can be automatically inlined by the compiler by adding the "-Minline" flag. (routines that have contained functions can't be automatically inlined)
However, your code wont accelerate due to the loop dependency on the w,x,y, and z variables. You can work around this by making these variables public in your module and then using the "private" clause to make it so that each thread has it's own copy of the variables. The problem with this is that each thread will have the same initial seed and therefore generate the same random values. Hence, the better solution is to make w, x, y, and z arrays, one for each iteration of the "i" loop, and then randomly generate and array of seeds.
Hope this helps,
Mat |
|
| Back to top |
|
 |
dcwarren
Joined: 18 Jun 2012 Posts: 29
|
Posted: Mon Jan 28, 2013 3:03 pm Post subject: |
|
|
Thanks for the response. I knew about the necessity of inlining, but not about contained functions. Where could I have found that information?
As to generating the random seed, I changed kisset to a subroutine and called kisset at the top of each iteration of outer_loop as follows:
| Code: | | call kisset(iseed + i, 2*iseed + i, 3*iseed + i, 4*iseed + i) |
That should give each iteration of outer_loop its own unique and repeatable seed for generating random numbers, without the need for an array of seeds.
When I tried to compile the code with the above changes, I got a bunch of info messages about live-out variables. The compiler did seem to generate code that would run, albeit more slowly than the CPU version.
Listing w, x, y, and z as private variables in the opening !$acc statement got me four messages about variables not being explicitly declared. All four of those are non-private variables in the module marsaglia, which is being used by the main program. Removing the references to public and private from the module made things work (and made the accelerated code 10x faster than the CPU code!), so I think that's an issue with my Fortran knowledge rather than an accelerator problem.
Now when I tried to compile I got 12 references to loop-carried scalar dependences, one for each of w/x/y/z for each of the three calls to kiss(). I don't like avoidable warning messages, so I tried telling the accelerator to run the do loop in kiss() sequentially by adding the line !$acc loop seq right above the do while(.true.) line in kiss(). This broke the code.
Below are the current state of the code and the error messages generated.
| Code: | !--------------------------------------------------------------------
module marsaglia
implicit none
INTEGER :: x=123456789, y=362436069, z=521288629, w=916191069
contains
subroutine kiss(rand_out)
integer :: i
real :: rand_out
! The KISS (Keep It Simple Stupid) random number generator.
! http://www.fortran.com/kiss.f90 . Slightly modified.
!$acc loop seq
do while(.true.)
x = 69069 * x + 1327217885
y = m (m (m (y, 13), - 17), 5)
z = 18000 * iand (z, 65535) + ishft (z, - 16)
w = 30903 * iand (w, 65535) + ishft (w, - 16)
i = x + y + ishft (z, 16) + w
rand_out = i*2.33e-10 + 0.5
if((rand_out .gt. 0.) .and. (rand_out .lt. 1.)) return
enddo
end subroutine
function m(k, n)
integer :: m, k, n
m = ieor (k, ishft (k, n) )
end function m
subroutine kisset (ix, iy, iz, iw)
integer :: ix, iy, iz, iw
x = ix
y = iy
z = iz
w = iw
end subroutine kisset
end module marsaglia
!--------------------------------------------------------------------
!=====================
program inf_while_test
use marsaglia
implicit none
integer :: i, imax, iseed
integer, dimension(8) :: time_array
real :: temp, rand, start_time, end_time
iseed = -2255
imax = 20000
call date_and_time(values=time_array)
start_time = time_array(5)*3600 + time_array(6)*60 + &!&
time_array(7) + 0.001*time_array(8)
!$acc kernels loop private(i,w,x,y,z,rand,temp)
outer_loop: do i = 1, imax
call kisset(iseed+i, 2*iseed+i, 3*iseed+i, 4*iseed+i)
inner_loop: do while(.true.)
call kiss(rand)
if(rand .gt. 0.99) then
exit inner_loop
endif
call kiss(rand)
if(rand .lt. 0.9) then
temp = rand
else
call kiss(rand)
temp = rand * 3.14159d0
endif
enddo inner_loop
enddo outer_loop
!$acc end kernels
call date_and_time(values=time_array)
end_time = time_array(5)*3600 + time_array(6)*60 + &!&
time_array(7) + 0.001*time_array(8)
print *, "time = ",end_time - start_time
end program inf_while_test
!========================= |
| Code: | PGF90-S-0155-DO loop expected after ??? (INF_WHILE_TEST.f90: 15)
PGF90-S-0104-Illegal control structure - unterminated ACC LOOP directive (INF_WHILE_TEST.f90: 13)
0 inform, 0 warnings, 2 severes, 0 fatal for kiss
|
I have a few questions about the program at this point.
- First, why is the compiler complaining about unterminated ACC LOOP directives? According to the quick reference manual the loop clause does not have an associated "end" statement. (In fact, on page 12 of the PGI OpenACC Getting Started Guide there's an !$acc kernels loop statement without an ending.)
- Second, what changes would I need to make to either the code or (more likely) the OpenACC statements to let the compiler know that I know the do loop in kiss() should be run sequentially?
- Third, what would I need to change to eliminate those warning statements about loop carried scalar dependences? In the actual production code there would be so many of these the actual compiler information I want to see would be drowned out.
- Alternately, should I just give up with these warning messages and accept that they are good and unavoidable?
Thanks for any information you can provide. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Jan 28, 2013 4:58 pm Post subject: |
|
|
| Quote: | | First, why is the compiler complaining about unterminated ACC LOOP directives? | Because our have an "loop" directive that's not contained within a kernel region. It gets inlined into one, but needs to stand-one since the routine may only be inlined in some cases. Plus a "while" loop can't be accelerated since the number iterations isn't known at the time the loop begins.
| Quote: | | According to the quick reference manual the loop clause does not have an associated "end" statement. (In fact, on page 12 of the PGI OpenACC Getting Started Guide there's an !$acc kernels loop statement without an ending.) | Correct, but it does need to be within a "kernel" or "parallel" region.
| Quote: | | Second, what changes would I need to make to either the code or (more likely) the OpenACC statements to let the compiler know that I know the do loop in kiss() should be run sequentially? | Since it's not parallel, the compiler has no choice but the run it sequentially. No loop schedule is needed and you can just remove it.
| Quote: | | Third, what would I need to change to eliminate those warning statements about loop carried scalar dependences? In the actual production code there would be so many of these the actual compiler information I want to see would be drowned out. | I've complained about the excessive informational messages as well but the analysis is done before the kernels are generated, so the compiler engineers needed to keep it this way. Though, from the output, it seems to be doing the correct thing:
| Code: | % pgf90 -acc -Minline -Minfo=acc kiss.f90
inf_while_test:
59, Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
60, Loop is parallelizable
Accelerator kernel generated
60, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
63, Loop carried scalar dependence for 'x' at line 63
Loop carried scalar dependence for 'y' at line 63
Loop carried scalar dependence for 'z' at line 63
Loop carried scalar dependence for 'w' at line 63
Scalar last value needed after loop for 'rand' at line 64
Inner sequential loop scheduled on accelerator
68, Loop carried scalar dependence for 'x' at line 68
Loop carried scalar dependence for 'y' at line 68
Loop carried scalar dependence for 'z' at line 68
Loop carried scalar dependence for 'w' at line 68
Scalar last value needed after loop for 'rand' at line 69
Inner sequential loop scheduled on accelerator
72, Loop carried scalar dependence for 'x' at line 72
Loop carried scalar dependence for 'y' at line 72
Loop carried scalar dependence for 'z' at line 72
Loop carried scalar dependence for 'w' at line 72
Inner sequential loop scheduled on accelerator
|
| Quote: | | Alternately, should I just give up with these warning messages and accept that they are good and unavoidable? | I have.
- Mat |
|
| Back to top |
|
 |
dcwarren
Joined: 18 Jun 2012 Posts: 29
|
Posted: Tue Jan 29, 2013 8:32 am Post subject: |
|
|
Making those changes (and giving up the fight against warning messages) makes that little test code run spectacularly. Thanks for the insights.
However, when I apply those lessons to the actual production code, I still get errors about unsupported calls to certain subroutines. Given what you mentioned before, I think the issue is inability to inline said subroutines. In the PGI Fortran Compiler Manual, there are a few reasons given why subprograms wouldn't be inlined:
| Quote: | A Fortran subprogram is not inlined if any of the following applies:
- It is referenced in a statement function.
- A common block mismatch exists; in other words, the caller must contain all common blocks specified in the callee, and elements of the common blocks must agree in name, order, and type (except that the caller's common block can have additional members appended to the end of the common block).
- An argument mismatch exists; in other words, the number and type (size) of actual and formal parameters must be equal.
- A name clash exists, such as a call to subroutine xyz in the extracted subprogram and a variable named xyz in the caller
|
I notice that there's nothing mentioned in here about "contains" statements, and I don't believe any of these four restrictions applies to my code. Is there a larger list of these restrictions that has yet to be published, and would you share it with me if so?
Thanks in advance. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|