PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Accelerating Exponentiation

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Mon Aug 17, 2009 7:21 am    Post subject: Accelerating Exponentiation Reply with quote

As I am debugging my code, a thought occurred to me about the code I was looking at. Namely, should I accelerate code (with the pragmas, or perhaps with CUDA at all) that involves exponentiation? Say I have a code fragment like:
Code:
do ik=2,6
 do k=0,np
  do i=1,m
   aa(i,k,ik) = aa(i,k,ik-1)**6
  enddo
 enddo
enddo
Would this be worth accelerating?

Or should I "unroll" the exponentiation so that it involves only multiplications:
Code:
do ik=2,6
 do k=0,np
  do i=1,m
   aa(i,k,ik) = aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)
  enddo
 enddo
enddo

It's possible the compiler would do this automatically, but perhaps not. And perhaps the Accelerator pragmas prefer to see multiplies instead of powers (since multiply is a "simple" floating point instruction)? And, perhaps, this is the kind of thing that a GPU just shouldn't do?!

As you can tell, I'm not a computer engineer, but a scientist by trade, so I'm still getting used to this "thinking" about my programming rather than just transcribing equations and brute forcing.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Aug 17, 2009 10:02 am    Post subject: Reply with quote

Hi Matt,

I'm going to guess that the "aa*aa*aa*..." version is a bit faster in this case but you'll most likely want to do some experimentation.

Another thing that you'll want to try experimenting with, is making "ik" the inner most loop. Because of the backward dependency (ik-1), the ik loop needs to be run sequentially. Having it as the outer most loop, it will be run sequentially on the host and launch the CUDA kernel multiple times. Having it as the inner most loop, the sequential section will be run within the CUDA kernel on the GPU.

Though, with the small loop count it may not matter so definitely try it both ways.

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group