PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

do seq

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
WENYANG LIU



Joined: 26 Sep 2010
Posts: 11

PostPosted: Tue Oct 05, 2010 10:30 am    Post subject: do seq Reply with quote

Hi everyone,

I wrote a simple code to test "do seq"
Code:

program seq

implicit none
integer::i,final

!$acc region
!$acc do seq   
do i=1,10
        final=i
end do
!$acc end region


write(*,*)final
end  program


I thought the result would be "10" as that if I use "do host" in the above code.
However, the result is a random number.

Can anyone explain me the result? I think my understanding to "do seq" is wrong.

Thanks a lot.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Wed Oct 06, 2010 2:37 am    Post subject: Reply with quote

Hi WENYANG LIU,

Scalars are privatized so the 'final' used in the kernel is different from the host's copy of 'final', hence why you're printing out seemingly random values (you're really printing out uninitialized memory).

The work-around is to make final a single element array:

Code:
% a.out
           10
xps730:/tmp/qa% cat seq.f90
program seq
   implicit none
   integer :: i, final(1)

!$acc region
!$acc do seq
do i=1,10
  final(1)=i
end do
!$acc end region

print *, final(1)
end program

% pgf90 -ta=nvidia -Minfo seq.f90
seq:
      5, Generating copyout(final(1))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
      7, Parallelization would require privatization of array 'final(1)'
         Accelerator kernel generated
          7, !$acc do seq
             CC 1.0 : 2 registers; 0 shared, 8 constant, 0 local memory bytes; 33% occupancy
             CC 1.3 : 2 registers; 0 shared, 8 constant, 0 local memory bytes; 25% occupancy
             CC 2.0 : 4 registers; 0 shared, 40 constant, 0 local memory bytes; 16% occupancy
% a.out
           10


While this 'works', you generally only want to use sequential code within a parallel region. Running purely sequential code on a GPU will be quite slow.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group