|
| View previous topic :: View next topic |
| Author |
Message |
msgross42
Joined: 19 Feb 2013 Posts: 6
|
Posted: Wed Mar 06, 2013 11:34 am Post subject: Derived Type Array Problem |
|
|
| I am trying to allocate a array of a derived type before passing it to my GPU kernel. I have type A defined as a host variable and type B as a device variable of dimension 2. If I use B(:)=A, my code compiles without problem, but when I run it, it crashes before executing the first line of code. I get a window that pops up saying the code has stopped working. If I run the code where type B is just a "scalar" type, then it works fine. I have also tried first making a type C that is also dimension 2 and allocated that by C(:)=A and then B=C. This gives me the same result. Is there a way to allocate an array of type B so all elements are the same as type A? If not, is there a way to pass type B as a local variable into each thread so each thread can execute using the type without interfering with the other threads? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Mar 06, 2013 12:14 pm Post subject: |
|
|
Hi msgross42,
I'll need a reproducing example because "B(:)=A" should work. If it's too big to post, please send to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.
Thanks,
Mat |
|
| Back to top |
|
 |
msgross42
Joined: 19 Feb 2013 Posts: 6
|
Posted: Wed Mar 06, 2013 12:47 pm Post subject: |
|
|
Here is a test code that I made which is exhibiting the same problem.
Module Gpumodule
use cudafor
implicit none
real*8,constant :: dt
real*8,constant :: tfinal
real*8,constant :: a
real*8,constant :: b
type InitialConditions
real*8 :: x1(10) = 0.0
real*8 :: x2(10) =0.0
real*8 :: x3(10)= 0.0
end type InitialConditions
type Body
real*8 :: State(3) = 0.0
real*8 :: Statedot(3) = 0.0
type(InitialConditions) IC
end type Body
contains
attributes(global) subroutine simulation(BD,statefinal)
integer i,j,k,e,indx,npts
real*8 sum,nominaltime,nominalstate(3),rkalfa(4),krkbody(3,4),statefinal(10,*),time
type(Body)::BD(:)
indx=(blockidx%x-1) * blockdim%x + threadidx%x
! Define Constants
rkalfa(1) = 1.0; rkalfa(2) = 2.0; rkalfa(3) = 2.0; rkalfa(4) = 1.0
! Initial State Vector
BD(indx)%State(1)=BD(indx)%IC%x1(indx)
BD(indx)%State(2)=BD(indx)%IC%x2(indx)
BD(indx)%State(3)=BD(indx)%IC%x3(indx)
time=0
! Integrate Equations of Motion
npts = nint(tfinal/dt)
do i=1,npts
! Store Nominal State Values
nominaltime = time
nominalstate = BD(indx)%State
! Numerical Integration of Equations of Motion
do j=1,4
! State Values to Evaluate Derivatives
if (j .ne. 1) then
time = nominaltime + dt/rkalfa(j)
do k=1,3
BD(indx)%State(k) = nominalstate(k) + krkbody(k,j-1)/rkalfa(j)
end do
end if
! Compute Derivatives
call deriv(BD(indx))
do k=1,3
krkbody(k,j) = dt*BD(indx)%Statedot(k)
end do
end do
! Step Time
time = nominaltime + dt
! Step States
do j=1,3
sum = 0.0
do k=1,4
sum = sum + rkalfa(k)*krkbody(j,k)
end do
BD(indx)%State(j) = nominalstate(j) + sum/6.0
end do
end do
statefinal(indx,1)=BD(indx)%State(1)
statefinal(indx,2)=BD(indx)%State(2)
statefinal(indx,3)=BD(indx)%State(3)
end subroutine simulation
attributes(device) subroutine deriv(BD)
type(Body) BD
BD%Statedot(1)=BD%State(2)
BD%Statedot(2)=BD%State(3)
BD%Statedot(3)=BD%State(1)*a+BD%State(2)*b
return
end subroutine deriv
end Module GPUmodule
Program Main
use cudafor
use GPUmodule
implicit none
type(Body)::BDH
type(Body),device::BDD(10)
real*8 statefinalh(10,3)
real*8,device :: statefinald(10,3)
integer i,istat
write(*,*) 'start'
a=2
b=5
dt=0.01
tfinal=5
do i=1,10
BDH%IC%x1(i)=i
BDH%IC%x2(i)=i
BDH%IC%x3(i)=i
end do
BDD(:)=BDH
call simulation<<<1,10>>>(BDD,statefinald)
istat=cudaDeviceSynchronize()
statefinalh=statefinald
do i=1,10
write(*,*) statefinalh(i,:)
end do
end Program Main |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Mar 06, 2013 1:30 pm Post subject: |
|
|
Ok, I thought meant "B" was just a two element array and "A" was a scalar.
A UDT to UDT requires a deep copy. The error is because BDD's address need to be deferenced and since it's a device pointer, a segv occurs.
Though in looking at your code, I'm wondering if these UDT's are necessary. It seems you just need to pass in the initial conditions and then use local scalars to hold the thread's states. Using local scalars will help performance since these can be held in the register file rather then being fetch from global memory.
- Mat |
|
| Back to top |
|
 |
msgross42
Joined: 19 Feb 2013 Posts: 6
|
Posted: Wed Mar 06, 2013 1:38 pm Post subject: |
|
|
| The reason it is set up like this is to try and mimic how my actual code is set up. I am trying to adapt an existing code to be used on a GPU and I am trying to limit the number of significant modifications. Is there a way to define another derived type within the GPU kernel that exists only in local memory as opposed to continually referencing the type that is passed into global memory in the function call? |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|