PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

($acc parallel loop) VS ( $acc kernels loop ) ?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
JMa



Joined: 30 Nov 2012
Posts: 22

PostPosted: Thu Jan 10, 2013 5:00 pm    Post subject: ($acc parallel loop) VS ( $acc kernels loop ) ? Reply with quote

Hi Mat and All,
I found very surprising speed difference (about 20 times) between these two from the following simple loop tests:

1, $acc kernels loop :
CODE:
call system_clock(count1, count_rate, count_max)
!$acc kernels loop
do i=1, n_size
do j=1, n_size
do k = 1, n_size
c(i,j) = c(i,j) + a(i,k)*b(k,j)
enddo
enddo
enddo

print*, 'iternation#:',n_size*n_size

call system_clock(count2, count_rate, count_max)
write(*,*)'GPU costs',(count2-count1),'micronseconds'


RESULTS:
iteration#: 4000000
GPU costs 1030000 micronseconds


2, $acc parallel loop :

CODE:
call system_clock(count1, count_rate, count_max)
!$acc parallel loop
do i=1, n_size
do j=1, n_size
do k = 1, n_size
c(i,j) = c(i,j) + a(i,k)*b(k,j)
enddo
enddo
enddo
!$acc end parallel
print*, 'iternation#:',n_size*n_size

call system_clock(count2, count_rate, count_max)
write(*,*)'GPU costs',(count2-count1),'micronseconds'


RESULTS:

iteration#: 4000000
GPU costs 22168000 micronseconds

Why they are so different? Any inputs of the reasons behind this is very appreciated.

Thanks,
Jingsen
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Fri Jan 11, 2013 9:40 am    Post subject: Reply with quote

Hi Jingsen,

The main difference between the "kernels" and "parallel" constructs is that with "kernels" the default is for the compiler do all the scheduling and kernel generation automatically, while with "parallel", it's up to the user to decide how to create the kernels and schedule the loops.

This article goes more in-depth: http://www.pgroup.com/lit/articles/insider/v4n2a1.htm

Take a look at the compiler feedback messages (-Minfo=accel) and pay particular attention to how the loops are being scheduled. This should give you your answer as to the performance difference. Note that the schedule will also effect the use of caching, which may be another factor in the performance difference.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group