PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

quite puzzled
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
KevinWoo



Joined: 08 Aug 2012
Posts: 19

PostPosted: Mon Oct 15, 2012 5:27 am    Post subject: quite puzzled Reply with quote

Excuse me. My code had wrote the "!$acc data copyout(umin,umout) "and the informational messages also told me that "1204, Generating copyout(umout),Generating copyout(umin)"
However, it is definitely that they were not transfer to my cpu code. (In the kernel with the right result while always be zero outside)
Well,I just can`t get through it.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Oct 15, 2012 11:51 am    Post subject: Reply with quote

Hi Kevin,

Quote:
However, it is definitely that they were not transfer to my cpu code. (In the kernel with the right result while always be zero outside)
. Just so I understand, you have copyout of an array but the halo coming back which should be all zero's has garbage values? copyout doesn't initialize data on the GPU, hence you must set all values within your compute region. Otherwise, garbage values will be returned for the uninitialized elements. Using "copy" instead will initialize the values.

- Mat
Back to top
View user's profile
KevinWoo



Joined: 08 Aug 2012
Posts: 19

PostPosted: Mon Oct 15, 2012 7:29 pm    Post subject: Thank you, but... Reply with quote

Dear Mat,

Thanks a lot for your answer. With your explaination of [copy/copyout] I`ve actually soluted the problem that I met in the getu12 subroutine discussed in another topic.While it still won`t get through here.

It was my fault not making my words clear. I had reset the value within the compute region and indeed got a right value after computation.while when my subroutine come back I got a zero all the time even though I have changed the directive from copyout to copy. So it seems to be Unbelieveable and beyond your help ability?

But,really why`that? Well,I should left it for the moment.

Would you please tell me any way to deal with a 5 level loops? Right now I add the acc directive like that below. Are there any better suggestion for it? Since I have run the kernel for k*i times.
do k = ks, ke+2
do i = is, ie+2
!$acc data create(dis1,dis2),copyout(AAx1, AAy1, AAz1,AAx2, AAy2, AAz2)
!$acc kernels
AAx1 = c0
AAx2 = c0
AAy1 = c0
AAy2 = c0
AAz1 = c0
AAz2 = c0
do n = ks+1, ke+1
do m = js+1, je+2*yele-1
do l = is+1, ie+1
dis1 = sqrt( disx2(i,l) + disy2(js,m) + disz2(k,n) )
dis2 = sqrt( disx2(i,l) + disy2(je+2*yele,m) + disz2(k,n) )
AAx1 = AAx1 + muf4pi * jx(l,m,n) * dv(m,n) / dis1
AAx2 = AAx2 + muf4pi * jx(l,m,n) * dv(m,n) / dis2
AAy1 = AAy1 + muf4pi * jy(l,m,n) * dv(m,n) / dis1
AAy2 = AAy2 + muf4pi * jy(l,m,n) * dv(m,n) / dis2
AAz1 = AAz1 + muf4pi * jz(l,m,n) * dv(m,n) / dis1
AAz2 = AAz2 + muf4pi * jz(l,m,n) * dv(m,n) / dis2
end do
end do
end do
!$acc end kernels
!$acc end data
Ax(i,js,k) = AAx1
Ax(i,je+2*yele,k) = AAx2
Ay(i,js,k) = AAy1
Ay(i,je+2*yele,k) = AAy2
Az(i,js,k) = AAz1
Az(i,je+2*yele,k) = AAz2
end do
end do
Thanks a lot anyway.
Back to top
View user's profile
KevinWoo



Joined: 08 Aug 2012
Posts: 19

PostPosted: Mon Oct 15, 2012 7:41 pm    Post subject: mirror Reply with quote

Excuse me, actually I`ve tried the mirror derective for a change. Still failed at the moment. Probably this time is for my short of knowledge about mirror and reflected.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Tue Oct 16, 2012 9:26 am    Post subject: Reply with quote

Hi Kevin,

Quote:
while when my subroutine come back I got a zero all the time even though I have changed the directive from copyout to copy. So it seems to be Unbelieveable and beyond your help ability?
What would help is to have a small example that reproduces the problem. If you're unable to create a small example, you can send the code to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.

Assuming the trip count of the "k" and "i" loop are large, I'd put these two into a 2D gang and then put the inner loops into a vector with a reduction clause. If "k" and "i" are small, move the data region outside of "k".

Also, you don't want to put the "AA" variables in a copyout. Instead, let the compiler create a reduction. By putting "dis1" and "dis2" in a create clause, you've promoted them to global scalar variables shared by all threads. This will cause a race condition and wrong answers. Finally, you do want to put you arrays in the data region.

Version 1 would look something like the following. Though, since I don't have the full code to test, you may need to make a few changes

Code:
!$acc data copyin(disx2,disy2,disz2,jx,dv), copy(Ax,Ay,Az)
!$acc kernels
!$acc loop collapse(2) gang
do k = ks, ke+2
do i = is, ie+2
AAx1 = c0
AAx2 = c0
AAy1 = c0
AAy2 = c0
AAz1 = c0
AAz2 = c0
!$acc loop vector reduction(+:AAx1,AAx2,AAy1,AAy2,AAz1,AAz2)
do n = ks+1, ke+1
do m = js+1, je+2*yele-1
do l = is+1, ie+1
dis1 = sqrt( disx2(i,l) + disy2(js,m) + disz2(k,n) )
dis2 = sqrt( disx2(i,l) + disy2(je+2*yele,m) + disz2(k,n) )
AAx1 = AAx1 + muf4pi * jx(l,m,n) * dv(m,n) / dis1
AAx2 = AAx2 + muf4pi * jx(l,m,n) * dv(m,n) / dis2
AAy1 = AAy1 + muf4pi * jy(l,m,n) * dv(m,n) / dis1
AAy2 = AAy2 + muf4pi * jy(l,m,n) * dv(m,n) / dis2
AAz1 = AAz1 + muf4pi * jz(l,m,n) * dv(m,n) / dis1
AAz2 = AAz2 + muf4pi * jz(l,m,n) * dv(m,n) / dis2
end do
end do
end do
Ax(i,js,k) = AAx1
Ax(i,je+2*yele,k) = AAx2
Ay(i,js,k) = AAy1
Ay(i,je+2*yele,k) = AAy2
Az(i,js,k) = AAz1
Az(i,je+2*yele,k) = AAz2
end do
end do
!$acc end kernels
!$acc end data


Version 2 where you only accelerate the inner loops:
Code:
!$acc data copyin(disx2,disy2,disz2,jx,dv), copy(Ax,Ay,Az)
do k = ks, ke+2
do i = is, ie+2
AAx1 = c0
AAx2 = c0
AAy1 = c0
AAy2 = c0
AAz1 = c0
AAz2 = c0
!$acc kernel loop
do n = ks+1, ke+1
do m = js+1, je+2*yele-1
do l = is+1, ie+1
dis1 = sqrt( disx2(i,l) + disy2(js,m) + disz2(k,n) )
dis2 = sqrt( disx2(i,l) + disy2(je+2*yele,m) + disz2(k,n) )
AAx1 = AAx1 + muf4pi * jx(l,m,n) * dv(m,n) / dis1
AAx2 = AAx2 + muf4pi * jx(l,m,n) * dv(m,n) / dis2
AAy1 = AAy1 + muf4pi * jy(l,m,n) * dv(m,n) / dis1
AAy2 = AAy2 + muf4pi * jy(l,m,n) * dv(m,n) / dis2
AAz1 = AAz1 + muf4pi * jz(l,m,n) * dv(m,n) / dis1
AAz2 = AAz2 + muf4pi * jz(l,m,n) * dv(m,n) / dis2
end do
end do
end do
Ax(i,js,k) = AAx1
Ax(i,je+2*yele,k) = AAx2
Ay(i,js,k) = AAy1
Ay(i,je+2*yele,k) = AAy2
Az(i,js,k) = AAz1
Az(i,je+2*yele,k) = AAz2
end do
end do
!$acc end data



- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group