PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

CUDA fortran device array parameter
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
SarahA



Joined: 29 Aug 2006
Posts: 16

PostPosted: Tue Mar 23, 2010 1:29 pm    Post subject: CUDA fortran device array parameter Reply with quote

Continuing the porting problem I have, I have isolated a frustrating bug to strange behavior in passing device arrays to kernel programs.

In my application, the copy of device to host array completes, but with gibberish in the host array. I spend a *long* time trying to find a problem in the kernel code.

Am I missing something, or is this a bug in PGI 10.3.0 ?

In this simple example, I get a failure like this:

pgfortran haha3.CUF
./a.out
about to do x=a
copyout Symbol Memcpy FAILED:4

If I delete the array parameter to foo_kernel ( instead using the module array 'a' ) the test works:

./a.out
about to do x=a
25.00000 100.0000 6250.000 25000.00
FORTRAN STOP

Code:

#define NCOMPLEX 500

module bar
  use cudafor
  complex, device, dimension(NCOMPLEX) :: a
contains

attributes(global) subroutine foo_kernel(thing)
  complex, device, dimension(NCOMPLEX) :: thing

  it = threadidx%x + (blockidx%x-1) * blockdim%x
  if ( it .le. NCOMPLEX ) thing(it) = 100.0 * thing(it)
  return

end subroutine foo_kernel

end module bar

! ----------------------------------------------------------------

subroutine foo(x)

  use cudafor   
  use bar
  type(dim3) :: dimGrid, dimBLock
  complex, dimension(NCOMPLEX) :: x
  integer nt,ng     

! .. 100 threads

  nt = min( 100, NCOMPLEX )
  ng = max( 1, (NCOMPLEX+nt-1)/nt )

  a= x
  call foo_kernel<<<ng,nt>>> (a)
  i = cudathreadsynchronize()
  write(0,*) "about to do x=a"
  x= a

end subroutine foo

! ----------------------------------------------------------------

program test
  integer, parameter :: NR=NCOMPLEX*2   
  save ha
  real ha(NR)

  do j = 1, NCOMPLEX
     ha(2*j-1) = 0.25*j
     ha(2*j  ) = j
  enddo

  call foo(ha)
  print *,ha(1),ha(2), ha(NCOMPLEX-1),ha(NCOMPLEX)
  stop
end program
Back to top
View user's profile
BeachHut



Joined: 14 Mar 2010
Posts: 11

PostPosted: Mon Mar 29, 2010 6:39 am    Post subject: Reply with quote

move complex, device, dimension(NCOMPLEX) :: a out of the module and into the host subroutine, like this:

Code:
#define NCOMPLEX 500

module bar
  use cudafor

contains

attributes(global) subroutine foo_kernel(thing)
  complex, device, dimension(NCOMPLEX) :: thing

  it = threadidx%x + (blockidx%x-1) * blockdim%x
  if ( it .le. NCOMPLEX ) thing(it) = 100.0 * thing(it)
  return

end subroutine foo_kernel

end module bar

! ----------------------------------------------------------------

subroutine foo(x)

  use cudafor   
  use bar
  type(dim3) :: dimGrid, dimBLock
  complex, dimension(NCOMPLEX) :: x
  complex, device, dimension(NCOMPLEX) :: a
  integer nt,ng     

! .. 100 threads

  nt = min( 100, NCOMPLEX )
  ng = max( 1, (NCOMPLEX+nt-1)/nt )

  a= x
  call foo_kernel<<<ng,nt>>> (a)
  i = cudathreadsynchronize()
  write(0,*) "about to do x=a"
  x= a

end subroutine foo

! ----------------------------------------------------------------

program test
  integer, parameter :: NR=NCOMPLEX*2   
  save ha
  real ha(NR)

  do j = 1, NCOMPLEX
     ha(2*j-1) = 0.25*j
     ha(2*j  ) = j
  enddo

  call foo(ha)
  print *,ha(1),ha(2), ha(NCOMPLEX-1),ha(NCOMPLEX)
  stop
end program


Alternatively you could probably stay with the same structure, but make a allocatable and allocate it in foo. I can't remember whether thing should have the 'device' attribute. Try removing it if you still have problems.
Back to top
View user's profile
SarahA



Joined: 29 Aug 2006
Posts: 16

PostPosted: Mon Mar 29, 2010 1:46 pm    Post subject: Reply with quote

BeachHut wrote:
move complex, device, dimension(NCOMPLEX) :: a out of the module and into the host subroutine, like this:

... OK, that may work but it defeats my purpose, which is to use module data as a work around for the lack of device common.

BeachHut wrote:
Alternatively you could probably stay with the same structure, but make a allocatable and allocate it in foo. I can't remember whether thing should have the 'device' attribute. Try removing it if you still have problems.

That's better. That works ( it should have the device attribute ). It is still a bug... as far as I can see there's nothing wrong with static module data, it just causes the run-time error.

Does a moderator here submit bug reports or should I do it through other channels?

Thanks!
Sarah
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Mar 29, 2010 5:04 pm    Post subject: Reply with quote

Hi Sarah,

Quote:
OK, that may work but it defeats my purpose, which is to use module data as a work around for the lack of device common.

Another work around is to have the device routine access "a" directly and not pass it in.

Code:
....
attributes(global) subroutine foo_kernel()

  it = threadidx%x + (blockidx%x-1) * blockdim%x
  if ( it .le. NCOMPLEX ) a(it) = 100.0 * a(it)
  return

end subroutine foo_kernel
...
call foo_kernel<<<ng,nt>>> ()
...


Quote:
Does a moderator here submit bug reports or should I do it through other channels?
I'm happy to submit them but if you prefer, you can send a report to PGI Customer Support (trs@pgroup.com). The only difference is that I post fix notification here where while Customer Support will notify you directly.

I went ahead and added this as TPR#16767. Several of our internal tests do similar operations, so it's unclear exactly why it fails here. Hopefully our engineers can get it fixed here shortly.

Thanks,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Thu Dec 02, 2010 6:35 pm    Post subject: Reply with quote

Hi Sarah,

Sorry this took so long but I just verified that TPR#16767 will be fixed the 11.0 release.
Code:

% pgf90 -V11.0 -fast test3.cuf -Mpreprocess
% a.out
 about to do x=a
    25.00000        100.0000        6250.000        25000.00   


Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group