PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

12.4: problem with the OpenACC present directive

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tiziano



Joined: 14 May 2012
Posts: 8

PostPosted: Mon May 14, 2012 9:02 am    Post subject: 12.4: problem with the OpenACC present directive Reply with quote

Hi,
this problem happens in various places in my code, but I will report the simplest case where it happens, a routine with 2 loops:

!$acc kernels present(t)
DO j = 1, je
DO i = 1, ie
tt_lheat(i,j,kup:klow,nnew) = tt_lheat(i,j,kup:klow,nnew) &
- t(i,j,kup:klow,nnew)
ENDDO
ENDDO
!$acc end kernels

and this what the compiler will generate:
99, Generating present(t(:,:,:,:))
Generating copy(tt_lheat(1:ie,1:je,kup:klow,nnew))
Generating local(t(1:ie,1:je,kup:klow,nnew))
Generating compute capability 2.0 binary
100, Loop is parallelizable
101, Loop is parallelizable
102, Loop is parallelizable
Accelerator kernel generated
100, !$acc loop gang, vector(4) ! blockidx%y threadidx%z
101, !$acc loop gang, vector(4) ! blockidx%x threadidx%y
102, !$acc loop vector(16) ! threadidx%x
CC 2.0 : 21 registers; 8 shared, 96 constant, 0 local memory bytes; 83% occupancy

clearly local (t) is not necessary, and it's actually a problem because at run-time I have an error: it seems that the compiler generates a free for what considers the local array (this happens in another subroutine):
pgi_acc_dataoff(devptr=0x203ae01e4,hostptr=0x22de2b0,offset=7,stride=1,size=28,extent=36,eltsize=4,lineno=2375,name=t$sd,flags=0x700=create+present+copyin)
unmap dev:0x203ae0200 host:0x22de2b0 size:112 offset:28 data[dev:0x203ae0200 host:0x22de2b0 size:112] (line:2368 name:t$sd)
__pgi_cu_free( 0x203ae0200, lineno=2375, name=t$sd )
call to cuMemFree returned error 700: Launch failed
CUDA driver version: 4020


Using the version 12.3 the local(t) becomes a copy of the same subarray of t.

Best Regards
Tiziano
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Mon May 14, 2012 10:18 am    Post subject: Reply with quote

Hi Tiziano,

Can you post a reproducing example?
Quote:
clearly local (t) is not necessary,

Possible, but my best guess is that the compiler is creating a contiguous temporary array to give better cache locality when it expands the innermost implied DO loop.

Quote:
__pgi_cu_free( 0x203ae0200, lineno=2375, name=t$sd )
This is freeing the section descriptor not "t" itself. It may or may not be the point of failure.

Quote:
call to cuMemFree returned error 700: Launch failed
Typically this means that the kernel abnormally aborted, though the error message doesn't appear until the next device call, such as a copy or free. Hence, it's more likely a problem with the kernel and rather than the free. Though, I will need a complete example to better understand the actual cause.

Note that the most common cause for "error 700" is an out-of-bounds array access or other memory violation.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group