PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

log() intrinsic uses too many registers

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Thu Feb 24, 2011 3:20 pm    Post subject: log() intrinsic uses too many registers Reply with quote

I want to implement this formula in kernel
z = a ^ b
with z, a, b are all double precision. If I use
z = a ** b

there is one 3 more register is required.

If i use
tmp = log(a)
z = exp(tmp * b)

the log() requires about 10 registers for itself. I'm not quite sure why log() use too many registers. Is there a better way?

Thanks,
Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Feb 25, 2011 10:48 am    Post subject: Reply with quote

Hi Tuan,

I'll need a bit more information since I'm not sure what you're basing your conclusion on. Can you please explain why you think pow uses 3 more registers than required and that log uses too may registers?

- Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Sat Feb 26, 2011 6:19 am    Post subject: Reply with quote

Hi Mat,
That was based on the PTXAS output information when I compile the program with -Mcuda=ptxinfo. Is this supposed to be a reliable info?

Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Feb 28, 2011 10:55 am    Post subject: Reply with quote

Hi Tuan,

I need an example of what you're seeing. Also please explain why you believe "there is one 3 more register is required". Finally, please explain what you mean by "the log() requires about 10 registers for itself".

- Mat

My little test program shows the log uses 11 less registers then pow.

Code:
% cat testlog.cuf
module cuda_gen
use cudafor
real*8, device, allocatable:: a_dev(:)
contains

attributes(global) subroutine testme (N,a,b)
use cudafor
integer, value :: N
real*8, value :: a,b
integer ix
#ifdef USE_LOG
real(8) :: tmp
#endif
ix =   (blockidx%x-1)*blockdim%x + threadidx%x
if (ix.lt.N) then
#ifdef USE_LOG
  tmp = log(a)
  a_dev(ix)=exp(tmp*b)
#else
  a_dev(ix)=a**b
#endif
endif

end  subroutine testme

end module cuda_gen

% pgf90 -Mcuda=ptxinfo,keepgpu -Mpreprocess -c testlog.cuf
ptxas info    : Compiling entry function 'testme' for 'sm_13'
ptxas info    : Used 16 registers, 24+16 bytes smem, 96 bytes cmem[0], 60 bytes cmem[1]
ptxas info    : Compiling entry function 'testme' for 'sm_20'
ptxas info    : Used 33 registers, 56 bytes cmem[0], 96 bytes cmem[2], 20 bytes cmem[16]
% pgf90 -Mcuda=ptxinfo,keepgpu -Mpreprocess -c testlog.cuf -DUSE_LOG
ptxas info    : Compiling entry function 'testme' for 'sm_13'
ptxas info    : Used 14 registers, 24+16 bytes smem, 96 bytes cmem[0], 56 bytes cmem[1]
ptxas info    : Compiling entry function 'testme' for 'sm_20'
ptxas info    : Used 22 registers, 56 bytes cmem[0], 96 bytes cmem[2], 20 bytes cmem[16]
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group