PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

problem with constant memory

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
okat



Joined: 04 Sep 2011
Posts: 3

PostPosted: Sun Sep 04, 2011 2:26 am    Post subject: problem with constant memory Reply with quote

Hello all,

I'm just trying to use CUDA Fortran and running some simple examples. And I have a problem with constant memory.

This is code of main program:

Code:

program main

use kernel
use cudafor
implicit real*8 (a-h,o-z)

type(dim3) N_blocks, N_threads
integer, parameter :: N=128

real*8, dimension(N, N, N) :: res
real*8, dimension(N, N, N), device :: resD

real*8, device :: x

theta=4
x=1.d0

N_blocks=dim3(N,N,1)
N_threads=dim3(N,1,1)

call kernel<<<N_blocks>>>(x,resD, N)

res=resD

sum=0.d0
do i=1,N
 do j=1,N
  do k=1,N
   sum=sum+res(i,j,k)
  end do
 end do
end do

print*,'sum = ', sum


end program main



And this is my kernel:

Code:

module kernel

integer, constant :: i1=2, i2=2 
real*8, constant :: pi=acos(-1.d0)
real*8, dimension(2), constant :: theta

contains

attributes(global) subroutine kernel(x, resD, N)
use cudafor
implicit none
real*8, device :: x
real*8, dimension (N, N, N), device :: resD
integer, value :: N
integer i,j,k

k=threadIdx%x
i=blockIdx%x
j=blockIdx%y

! resD(i,j,k)=i1*x*theta(1)   ! this works
  resD(i,j,k)=i2*x*theta(1)   ! but this does not work!

end subroutine kernel
end module kernel



The problem is that this kernel works when I use constant value i1, and does not work with constant i2
(error - "0: copyout Memcpy (host=0x6ade60, dev=0xf200000000, size=16777216) FAILED: 4(unspecified launch failure)").
But there are no differenses between i1 and i2 at all, exept only one - I declarated firstly i1 and then i2. If I declarate i2 and then i1 this code will work for i2 and will not work for i1...
I guess that there is any problem with acсess to constant memoty at device, but I don't understand what the problem is exactly? How should I use constant memory in order to skip such errors?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Sep 06, 2011 2:32 pm    Post subject: Reply with quote

Hi Okat,

Quote:
How should I use constant memory in order to skip such errors?
The work around is to set the values of constant variables from host code, and not use data initialization.

Hope this helps,
Mat
Back to top
View user's profile
okat



Joined: 04 Sep 2011
Posts: 3

PostPosted: Wed Sep 07, 2011 12:20 am    Post subject: Reply with quote

mkcolg wrote:

The work around is to set the values of constant variables from host code, and not use data initialization.


Unfortunatelly it doesn't help... Now I add to the main program defenition of the constant variables i1 and i2:
Code:

i1=2
i2=2

and in the kernel I still have just declaration of these variables:
Code:

integer, constant :: i1, i2 

but error is the same -
0: copyout Memcpy (host=0x6af1c0, dev=0xf200000000, size=16777216) FAILED: 4(unspecified launch failure)
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Sep 07, 2011 8:18 am    Post subject: Reply with quote

Hi Okat,

My apologizes. On further investigation the problem appears to be due the interaction of the constant variable and the scalar device argument, "x", and not constant data initialization. When I change "x" to be a host variable passed to the kernel by value, or move it to be a module device variable, then the code runs correctly. Note, that passing "x" by value also improves the performance.

I have reported this problem as TPR#18162. The good news is that the problem appears to have been already found and fixed internally, thought the fix has not yet been added to a release. Unfortunately, it might be too late to get it into the upcoming 11.9 release, but I'll see what we can do.

Thanks,
Mat
Back to top
View user's profile
okat



Joined: 04 Sep 2011
Posts: 3

PostPosted: Wed Sep 07, 2011 12:32 pm    Post subject: Reply with quote

mkcolg wrote:

My apologizes. On further investigation the problem appears to be due the interaction of the constant variable and the scalar device argument, "x", and not constant data initialization. When I change "x" to be a host variable passed to the kernel by value, or move it to be a module device variable, then the code runs correctly. Note, that passing "x" by value also improves the performance.

I have reported this problem as TPR#18162. The good news is that the problem appears to have been already found and fixed internally, thought the fix has not yet been added to a release. Unfortunately, it might be too late to get it into the upcoming 11.9 release, but I'll see what we can do.



Mat, thank you very much, now it works.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group