PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Dynamically freeing memory
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
SPHriction-3D



Joined: 12 Jan 2014
Posts: 48

PostPosted: Wed Mar 12, 2014 7:36 pm    Post subject: Dynamically freeing memory Reply with quote

Hi,

Hopefully someone can help me with this. I am trying to allocated and deallocate arrays in my code. In some cases it works fine, but not always. Here is a excerpt:

Code:
Call timing_start(time_bucket_start)
Call Bounds(x_d,Bound_d,nTotal,nCell_d,hsml_d)
Call BuildCellIndex<<<grid,tBlock>>>(x_d,hsml_d,nTotal,CellIndex_d,Bound_d,nCell_d)   
maxIndex = nCell_d(1)*nCell_d(2)*nCell_d(3)
Allocate(CellList_d(maxIndex,NumCLRows),NodesInCell_d(maxIndex))
NodesInCell_d = 0   
Call BuildCellList<<<grid,tBlock>>>(nTotal,CellIndex_d,CellList_d,maxIndex,NodesInCell_d)   
Call GPUtimeStart(startEvent1)                        
Call BucketSearch<<<grid,tBlock>>>&
(x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)   
Call GPUtime(startEvent1,stopEvent1,t_bucket)
If (Allocated(NodesInCell_d)) Deallocate(NodesInCell_d)
If (Allocated(CellList_d)) Deallocate(CellList_d)   
Call timing_end(time_bucket_start,time_bucket)


The general idea is to lay a cell grid out and bin particles into the cells. I want to be able to accommodate growing and shrinking domains by dynamically allocating and deallocating.

I have been successfully running the code for a few simple test problems. But I just found a case that caused the program to exit with an error message:

Quote:
0: Deallocate: Unspecified launch failure


I am using PVF 13.10.

Thanks for any help,

Kirk
Back to top
View user's profile
SPHriction-3D



Joined: 12 Jan 2014
Posts: 48

PostPosted: Thu Mar 13, 2014 5:49 am    Post subject: Reply with quote

I should mention that this bit of code is part of a program. The block of code is part of a loop (time stepping).

The particular error shows up after ~2800 time steps.

Code:
Do

Call timing_start(time_bucket_start)
Call Bounds(x_d,Bound_d,nTotal,nCell_d,hsml_d)
Call BuildCellIndex<<<grid,tBlock>>>     (x_d,hsml_d,nTotal,CellIndex_d,Bound_d,nCell_d)   
maxIndex = nCell_d(1)*nCell_d(2)*nCell_d(3)
Allocate(CellList_d(maxIndex,NumCLRows),NodesInCell_d(maxIndex))
NodesInCell_d = 0   
Call BuildCellList<<<grid,tBlock>>>(nTotal,CellIndex_d,CellList_d,maxIndex,NodesInCell_d)   
Call GPUtimeStart(startEvent1)                        
Call BucketSearch<<<grid,tBlock>>>&
(x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)   
Call GPUtime(startEvent1,stopEvent1,t_bucket)
If (Allocated(NodesInCell_d)) Deallocate(NodesInCell_d)
If (Allocated(CellList_d)) Deallocate(CellList_d)   
Call timing_end(time_bucket_start,time_bucket)

... More code in loop

If (time .ge. t_end) Exit

time = time + dtMin
itimestep = itimestep + 1

End Do   


Thank you for any help,

Kirk
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Thu Mar 13, 2014 8:27 am    Post subject: Reply with quote

Hi Kirk,

It sounds like things are getting out of sync or there's an issue with the driver allocating and deallocating that many times.

Can you try adding a call cudaDeviceSynchronize after BucketSearch? I'm not sure if your GPUtime call syncs the kernel and host before deallocating.


Code:
Call BucketSearch<<<grid,tBlock>>>&
 (x_d,hsml_d,CellList_d,nCell_d,CellIndex_d,NodesInCell_d,Neib_d,NeibCount_d,w_d,dwdx_d,nTotal,maxIndex)   
 call cudaDeviceSynchronize()
 Call GPUtime(startEvent1,stopEvent1,t_bucket)


If that doesn't work, I'll try writing a test case that replicates the issue.

- Mat
Back to top
View user's profile
SPHriction-3D



Joined: 12 Jan 2014
Posts: 48

PostPosted: Thu Mar 13, 2014 1:04 pm    Post subject: Reply with quote

I tried with

Code:
Call cudaDeviceSynchronize()


but that resulted in a compile error.

I then tried

Code:
istat = cudaDeviceSynchronize()


That ran, but again crashed at time cycle ~2800

Kirk
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Thu Mar 13, 2014 1:34 pm    Post subject: Reply with quote

Oops, sorry. My bad about the syntax.

Would you mind send you a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me? If it's just a matter of calling allocate and deallocate 2800 times, then I can recreate that, but in case it's something, I'd like to start with your known failing case.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group