PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Problem using LFSR random number generator in CUDA FORTRAN
Goto page 1, 2, 3, 4  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
tom.rb.edwards



Joined: 02 Dec 2010
Posts: 35

PostPosted: Tue Feb 15, 2011 8:14 am    Post subject: Problem using LFSR random number generator in CUDA FORTRAN Reply with quote

I've been trying to port this random number generator into CUDA. It's a linear feedback shift register RNG, which satisfies the park-miller minimum standard. The source code can be found here: http://www.physics.udel.edu/~bnikolic/teaching/phys660/F90/rlfsr113.f90

After seeding, the random numbers are generated like so:

Code:

      b  = ishft(ieor(ishft(z1,6),z1),-13)
      z1 = ieor(ishft(iand(z1,-2),18),b)

      b  = ishft(ieor(ishft(z2,2),z2),-27)
      z2 = ieor(ishft(iand(z2,-8),2),b)

      b  = ishft(ieor(ishft(z3,13),z3),-21)
      z3 = ieor(ishft(iand(z3,-16),7),b)

      b  = ishft(ieor(ishft(z4,3),z4),-12)
      z4 = ieor(ishft(iand(z4,-128),13),b)

      rand_num=ishft( ieor(ieor(ieor(z1,z2),z3),z4) , -1)*4.656612873077d-10


When this works correctly, I get a uniform distribution of random reals between 0 and 1. The problem is, when I port this random number generator into CUDA, I don't get the same numbers, and I get negative numbers as well. I figured out that the problem was to do with the fact that in FORTRAN, when you do a negative bit shift to a negative 32-bit integer, it cycles back to the highest positive integer (e.g, ISHFT(-4,-1) = 2147483646
). But when I do the same on CUDA, the number stays negative (e.g, ISHFT(-4,-1) = -2).

Is there a way to set it so that CUDA gets the same results as the CPU? Is it something to do with unsigned integers perhaps?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6136
Location: The Portland Group Inc.

PostPosted: Wed Feb 16, 2011 12:31 pm    Post subject: Reply with quote

Hi Tom,
Quote:

But when I do the same on CUDA, the number stays negative (e.g, ISHFT(-4,-1) = -2).
Do you have an example I could try? When I do an ISHFT(-4,-1) I get the same answers on both the CPU and GPU.

- Mat

Code:
$ cat testishft.f90
   module testshift
    
        integer, allocatable, dimension(:) :: arr
        integer, allocatable, dimension(:), device :: arrD

   contains
   
   attributes(global) subroutine testshft ()
        integer ix
        ix = (blockidx%x-1)*blockdim%x + threadidx%x
        arrD(ix) = ISHFT(-4,-1)
   end subroutine testshft

   end module testshift

        program foo
        use testshift

        integer i
        allocate(arr(32), arrD(32))
        call testshft<<<1,32>>>()
        arr=arrD
        i = ISHFT(-4,-1)
        print *, i, arr(1)
   end
$ pgf90 testishft.f90 -Mcuda; a.out
   2147483646   2147483646
Back to top
View user's profile
tom.rb.edwards



Joined: 02 Dec 2010
Posts: 35

PostPosted: Thu Feb 17, 2011 4:54 am    Post subject: Reply with quote

The problem occurs when I use a variable inside the ISHFT function. For example, if instead of ISHFT(-4,-1) I had ISHFT(d_num,-1), where d_num is a device integer set equal to 4, then I get the wrong answer. I've written I piece of code exactly like yours, except using 'd_num' in the kernel and 'num' in the host:

Code:
module testshift

  integer, allocatable, dimension(:) :: arr
  integer, allocatable, dimension(:), device :: arrD
  integer, device :: d_num = -4
 
contains

  attributes(global) subroutine testshft ()
    integer ix
    ix = (blockidx%x-1)*blockdim%x + threadidx%x
    arrD(ix) = ISHFT(d_num,-1)
  end subroutine testshft

end module testshift

program foo
  use testshift

  integer :: i, num = -4
  allocate(arr(32), arrD(32))
  call testshft<<<1,32>>>()
  arr=arrD
  i = ISHFT(num,-1)
  print *, i, arr(1)
end program foo


This is what I get after making those changes:
Code:
./a.out
   2147483646           -2


Thanks for the help btw. Do you know how I could get around this problem?
Back to top
View user's profile
tom.rb.edwards



Joined: 02 Dec 2010
Posts: 35

PostPosted: Thu Feb 17, 2011 8:06 am    Post subject: Reply with quote

Basically the problem can be boiled down to this simple problem on the GPU:


Code:
ISHFT(-4,-1) = 2147483646

integer :: var = -4
ISHFT(var,-1) = -2


The second one should be the same as the first one. So if there's a way of solving this disparity, that would be great.
Thanks
Back to top
View user's profile
Peter Nightingale



Joined: 14 Oct 2010
Posts: 20

PostPosted: Thu Feb 17, 2011 8:16 am    Post subject: Reply with quote

What your program prints out seems to be undefined; the kernel launch is asynchronous and the program just goes full steam ahead and prints garbage. Or am I missing something? Try inserting:
Code:

istat=cudathreadsynchronize()

before the print statement.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group