PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Automatic arrys in device memory
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
DAVID-SPH



Joined: 23 May 2011
Posts: 28

PostPosted: Wed Jun 06, 2012 4:30 am    Post subject: Automatic arrys in device memory Reply with quote

I have a question: Where does CUDA Fortran storre the automatic (static) device arrays?.
I seem to run into a problem of stack overflow in a program, with a kernel that repeats itself many times. Inside that kernel Iīm using an auxiliaria array of 512 integers. the array is declared in host code like this:
integer, device :: aux(512)
I know this is the kernel that overruns the stack beacuse I have an alternative implementation (slower) that use the dynamic (allocatable) auxiliary array and that one works just fine...

When tryin to use a module variable for this array (placing the declarion outside the host subroutine) above the keyword "contains" it simply canīt use it and produces a runtime error the first time it calls teh subroutine.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Wed Jun 06, 2012 3:40 pm    Post subject: Reply with quote

Hi David,

We'll need a bit more information here or an example. Note that there isn't a stack on the device so it can't be a stack overflow.

If I had to guess, I'd look if you are accessing "aux" out-of-bounds. How is it being accessed in your kernels? Is it being passed as an argument to the kernel? What's you kernel's launch configuration?

- Mat
Back to top
View user's profile
DAVID-SPH



Joined: 23 May 2011
Posts: 28

PostPosted: Wed Jun 06, 2012 5:17 pm    Post subject: Reply with quote

Yes I know there is no stack on the device, that is why the bug es strange.
I have a prefix sum scan (exclusive) that consists of three kernels, the aux is a vector that stores the last value of the prefix done by each block.
Launch configuration is always fixed 512 blocks 128 threadsfor kernel 1, 1 block 512 threads for kernel 2 and again 512 blocks of 128 threadsfor the third kernel.
The prefix sum scan is a host subroutine that calls the3 kernels. aux is declared in te host subroutine as
integer, device :: aux(512) then passed as argument to the three kernels.
I have made a "sandbox" program where I test the subroutine, it runs 10.000 times the subroutine flawlessly, but when I called the scan from the SPH program it fails...
it says
0 allocate 2048 bytes requested; status = 30(unknown error)
Those 2048 bytes seem to be the 512 integers * 4 bytes/integer
In theory it should free that memory automatically upon existing the host scan subroutine... but it doesn't.
My guess is that sandbox works because it somehow "reuses" the same memory area as all the calls are consecutive, but in the sph is not the case so start "eating" the memory...(it breaks down around the 1100th time it calls the scan)
Back to top
View user's profile
DAVID-SPH



Joined: 23 May 2011
Posts: 28

PostPosted: Thu Jun 07, 2012 8:37 am    Post subject: Reply with quote

This is the host subroutine variable declarations
Code:
subroutine exclusive_int_scan(vecin, vecout, size)
integer, value :: size
integer, device, dimension(:) :: vecin(size), vecout(size)
integer  :: i, threadchunk, blockchunk
integer, device :: aux(512)
type(dim3)    :: dimGrid, dimBlock
integer, dimension(:) :: debug_blockval(512)
integer :: errcode


as you can see we donīt need the allocate part for aux().
Now if it would solve all the problems I would just made aux allocatable and the allocate(aux(512)), the thing is that later I get -as Iīve tried that already-
a deallocate problem after few thousands calls to this routine...
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Jun 07, 2012 3:55 pm    Post subject: Reply with quote

Hi David,

We'll need to have an example to tell what's going on. Can we use the same code you sent to Brent or is this new?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group