PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Simple parallel region but... core dumped
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
fspiga



Joined: 21 Feb 2012
Posts: 16

PostPosted: Mon Apr 23, 2012 10:35 am    Post subject: Simple parallel region but... core dumped Reply with quote

Dear support,

I added a simple ACC region on the top of a single DO loop. Apparently everything should work easy.

The code is:
Code:
!$acc region copyin(aux, eigts1, eigts2, eigts3, mill, g) copyout(aux1)
 do ig = 1, ngm
    cfac = aux (ig, is) * &
           CONJG( eigts1 (mill (1,ig), na) * &
                  eigts2 (mill (2,ig), na) * &
                  eigts3 (mill (3,ig), na) )
    aux1 (ig) = cfac * g (jpol, ig)
 enddo
!$acc end region


"is" and "jpol" are indexes that come from outer loops. "aux1" is used just after the ACC region so I put it in the copyout clause. It does not require any initialization.

The -Minfo output reports:
Quote:
108, Generating copyout(aux1(:))
Generating copyin(g(jpol,1:ngm))
Generating copyin(mill(1:3,1:ngm))
Generating copyin(eigts3(:,:))
Generating copyin(eigts2(:,:))
Generating copyin(eigts1(:,:))
Generating copyin(aux(:,:))
Generating compute capability 2.0 binary
109, Loop is parallelizable
Accelerator kernel generated
109, !$acc do parallel, vector(32) ! blockidx%x threadidx%x
Non-stride-1 accesses for array 'g'
Non-stride-1 accesses for array 'mill'
CC 2.0 : 21 registers; 4 shared, 208 constant, 0 local memory bytes; 16% occupancy

(occupancy is low but well... I am more interested to get OpwnACC working on that specific point now :-P)

And, after the core is generated, this is the point where I get the error.
Quote:
(gdb) bt
#0 0x0000003513487fc6 in __memcpy_sse2 () from /lib64/libc.so.6
#1 0x00007f313104b691 in ?? () from /usr/lib64/libcuda.so.1
#2 0x00007f31310557b3 in ?? () from /usr/lib64/libcuda.so.1
#3 0x00007f3131055d8c in ?? () from /usr/lib64/libcuda.so.1
#4 0x00007f313104d54e in ?? () from /usr/lib64/libcuda.so.1
#5 0x00007f313102d6b7 in ?? () from /usr/lib64/libcuda.so.1
#6 0x00007f31310300ad in ?? () from /usr/lib64/libcuda.so.1
#7 0x00007f3131020923 in ?? () from /usr/lib64/libcuda.so.1
#8 0x0000000000877ba3 in __pgi_cu_upload2 (devptr=13865189376, hostptr=0xcb4d98, devx=0, devy=0, hostx=0, hosty=0, size1=1, size2=82835, devstride2=1,
hoststride1=1, hoststride2=3, elementsize=8, lineno=108, name=0xba335c "g$p") at ../src-nv/nvupload2.c:82
#9 0x0000000000873d2d in __pgi_cu_uploadx_seq (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c "g$p") at ../src-nv/nvuploadx.c:236
#10 0x0000000000875661 in __pgi_cu_uploadxx_p (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c "g$p", eventinfo=0xbf1190) at ../src-nv/nvuploadx.c:649
#11 0x0000000000875924 in __pgi_cu_uploadx_a_p (devptr=13865189376, hostptr=0xcb4d98, dims=2, desc=0x7fff50988f20, elementsize=8, lineno=108,
name=0xba335c "g$p", flags=0, async=0) at ../src-nv/nvuploadx.c:705
#12 0x00000000005e0cb7 in addusstres.pgi.uni.gpu_ (sigmanlc=...) at ./addusstress.F90:108
#13 0x00000000005caae1 in stres_knl.pgi.uni.istanbul_ (sigmanlc=..., sigmakin=...) at ./stres_knl.F90:90
#14 0x00000000004ae69e in stress.pgi.uni.istanbul_ (sigma=...) at ./stress.F90:116
#15 0x000000000041c32c in pwscf.pgi.uni.istanbul_ () at ./pwscf.F90:119


I think I put the ACC region directive in the right place with the right clauses. I do nto see any obstacle inside the loop, CONJG should be supported (I am using PGI 12.2). Is it possible that the program crash at that point due to "not enough memory available"? If yes, how detect and eventually apply a recovery strategy in the code?

Many thanks in advance!
F.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Apr 24, 2012 12:40 pm    Post subject: Reply with quote

Hi fspiga,

It looks like there is a problem copying g. Exactly what's wrong, I can't tell, but assume it has to do with copying only a single row.

Can you put a data region around the outer "jpol" loop and copy all of g? (I'm assuming there is one). Something like:
Code:

!$acc data region copyin(g)
do jpol = 1, N
!$acc region copyin(aux, eigts1, eigts2, eigts3, mill) copyout(aux1)
 do ig = 1, ngm
    cfac = aux (ig, is) * &
           CONJG( eigts1 (mill (1,ig), na) * &
                  eigts2 (mill (2,ig), na) * &
                  eigts3 (mill (3,ig), na) )
    aux1 (ig) = cfac * g (jpol, ig)
 enddo
!$acc end region
enddo
!$acc end data region


- Mat
Back to top
View user's profile
fspiga



Joined: 21 Feb 2012
Posts: 16

PostPosted: Tue Apr 24, 2012 1:43 pm    Post subject: Reply with quote

Still core dumped.

But I've realized that "g" was declared in this way
Code:
REAL(DP), ALLOCATABLE, TARGET :: g(:,:)


So I did this change, just to do a test
Code:
do jpol = 1, ipol
   g_acc(:)= g(jpol, :)
!$acc region copyin(aux, eigts1, eigts2, eigts3, mill,g_acc) copyout(aux1)
   do ig = 1, ngm
      cfac = aux (ig, is) * &
                   CONJG( eigts1 (mill (1,ig), na) * &
                                  eigts2 (mill (2,ig), na) * &
                                  eigts3 (mill (3,ig), na) )
      aux1 (ig) = cfac * g_acc(ig)
   enddo
!$acc end region
   ...
   ...
enddo



Not the message is:
Quote:
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 4020


and it has more sense. I am going to investigate about it. many thanks!
Back to top
View user's profile
fspiga



Joined: 21 Feb 2012
Posts: 16

PostPosted: Tue Apr 24, 2012 2:31 pm    Post subject: Reply with quote

I just realized that my sysadmin updated this morning the CUDA driver to the new release.

In order to use OpenACC, do I have to use the CUDA 4.0 driver? Is the CUDA 4.1 fine? Is it possible that CUDA 4.2 generates the problem I reported above?

Many thanks again!!!
Back to top
View user's profile
fspiga



Joined: 21 Feb 2012
Posts: 16

PostPosted: Wed Apr 25, 2012 3:39 am    Post subject: Reply with quote

The problem is the same, also after reverting the driver.

In that piece of code there is another assumption that might be incompatible with open ACC

This operation:
Code:
aux1(ig) = cfac * g_acc(ig) 


involves "cfac" (COMPLEX), "g_acc(ig) " (REAL) and "aux1" (COMPLEX). Both real and imaginary part of "cfac" are scaled by the value "g_acc(ig)" (REAL) and put in the right place in aux1(ig).

I did a simple test this time, I removed that line. And it works.

Is this "mix of types" allowed in a Open ACC region? If no, what kind of limitations there are on this regard?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group