PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Problem while parallelizing loops

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
uni-gw



Joined: 04 Jan 2011
Posts: 1

PostPosted: Tue Jan 11, 2011 7:27 am    Post subject: Problem while parallelizing loops Reply with quote

Hello, i have the following problem and hope you have advice for me. I try to change the following code to the gpu:

Code:

   !Temp Array!
   temp(1) = nin1
   temp(2) = nout1
   temp(3) = atb
   temp(4) = after
   temp(5) = atn
   temp2(1) = cr2
   temp2(2) = ci2
   temp2(3) = cr3
   temp2(4) = ci3
   temp2(5) = bb
   !END Temp Array

!$acc data region local(r,r1,r2,r3,s,s1,s2,s3,ninout)
!$acc* copyin(temp(1:5),temp2(1:5),zin(1:2,1:nfft,:))
!$acc* copyout(zout)
!$acc region do independent
        do ib=1,before
   !Ninout Array!
   ninout(1)=temp(1)+(ib*temp(4))
   ninout(2)=ninout(1)+temp(3)
        ninout(3)=ninout(2)+temp(3)
        ninout(4)=temp(2)+(ib*temp(5))
   ninout(5)=ninout(4)+temp(4)
   ninout(6)=ninout(5)+temp(4)
   !END Ninout Array!
        do j=1,nfft
        r1=zin(1,j,ninout(1))
        s1=zin(2,j,ninout(1))
        r=zin(1,j,ninout(2))
        s=zin(2,j,ninout(2))
        r2=r*temp2(1) - s*temp2(2)
        s2=r*temp2(2) + s*temp2(1)
        r=zin(1,j,ninout(3))
        s=zin(2,j,ninout(3))
        r3=r*temp2(3) - s*temp2(4)
        s3=r*temp2(4) + s*temp2(3)
        r=r2 + r3
        s=s2 + s3
        zout(1,j,ninout(4)) = r + r1
        zout(2,j,ninout(4)) = s + s1
        r1=r1 - .5d0*r
        s1=s1 - .5d0*s
        r2=temp2(5)*(r2-r3)
        s2=temp2(5)*(s2-s3)
        zout(1,j,ninout(5)) = r1 - s2
        zout(2,j,ninout(5)) = s1 + r2
        zout(1,j,ninout(6)) = r1 + s2
        zout(2,j,ninout(6)) = s1 - r2
   enddo
   enddo
!acc end region
!$acc end data region


The Compiler tells me :

710, Generating local(ninout(:))
Generating local(s3)
Generating local(s2)
Generating local(s1)
Generating local(s)
Generating local(r3)
Generating local(r2)
Generating local(r1)
Generating local(r)
Generating copyout(zout(:,:,:))
Generating copyin(zin(:,:nfft,:))
Generating copyin(temp2(:))
Generating copyin(temp(:))
713, Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
714, Loop is parallelizable
Accelerator kernel generated
714, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
Cached references to size [5] block of 'temp'
Cached references to size [6] block of 'ninout'
Cached references to size [5] block of 'temp2'
CC 1.3 : 22 registers; 180 shared, 16 constant, 0 local memory bytes; 50% occupancy
CC 2.0 : 33 registers; 92 shared, 100 constant, 0 local memory bytes; 50% occupancy
723, Loop is parallelizable

----------------------------------------------------

But if I run the programm the following error occurs

call to EventSynchronize returned error 700: Launch failed
CUDA driver version: 3020

Accelerator Kernel Timing data
/home/gast/SOURCE/./gfft.f
fftstp
713: region entered 1 time
time(us): init=0
714: kernel launched 1 times
grid: [1] block: [256]
time(us): total=0 max=0 min=0 avg=0
/home/gast/SOURCE/./gfft.f
fftstp
710: region entered 1 time
time(us): init=78930
data=88
---------------------------------------------------------

I guess it has something to do with the definition of the array, because the compiler seems to not take the values i've entered for the dimensions. Were I wrote a (1:2,1:nfft,:)
he makes (:,:nfft,:) or am I wrong here. Its a very large program and i try to parallelize only some time consuming parts. I dont know exactly which dimensions the third dimension for the array zin and zout will be. Is there a way to define them open in the third dimension?

Thanks very much
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Tue Jan 11, 2011 12:28 pm    Post subject: Reply with quote

Hi uni-gw

Quote:
I guess it has something to do with the definition of the array, because the compiler seems to not take the values i've entered for the dimensions. Were I wrote a (1:2,1:nfft,:)
he makes (:,:nfft,:)
I doubt that this is the problem. The ":" just means the full extent which I assume for zin is "1:2", and ":nfft" is short-hand for "1:nfft" since 1 is the lower-bound.
Quote:

call to EventSynchronize returned error 700: Launch failed
This typically means that your device kernel abnormally aborted or some reason. The first thing I would check is if all values of ninout are less than the bounds of zout's third dimension. An out-of-bounds error is the most common cause (at for the ones I've looked at).


Quote:
grid: [1] block: [256]
A secondary issue is the poor schedule being generated will lead to poor performance. You may wish to consider moving the ninout initialization code inside the j loop since this code inhibits the parallelization of the j loop. Unless 'before' is very large, the cost of the extra computation should be offset by the additional parallelization.

Also, if it's possible, you should move 'j' to the first dimension of your arrays. This will allow for contiguous data access across the threads and limit memory divergence. (http://www.pgroup.com/lit/articles/insider/v2n1a5.htm)

Something like:
Code:

!$acc data region local(r,r1,r2,r3,s,s1,s2,s3,ninout)
!$acc* copyin(temp(1:5),temp2(1:5),zin(1:2,1:nfft,:))
!$acc* copyout(zout)
!$acc region do independent
        do ib=1,before
!$acc region do independent
        do j=1,nfft

   !Ninout Array!
   ninout(1)=temp(1)+(ib*temp(4))
   ninout(2)=ninout(1)+temp(3)
        ninout(3)=ninout(2)+temp(3)
        ninout(4)=temp(2)+(ib*temp(5))
   ninout(5)=ninout(4)+temp(4)
   ninout(6)=ninout(5)+temp(4)
   !END Ninout Array!

        r1=zin(j,1,ninout(1))
        s1=zin(j,2,ninout(1))
...


Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group