PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Cuda host array allocation problem.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
sungjinkim



Joined: 15 Dec 2010
Posts: 1

PostPosted: Wed May 18, 2011 11:03 pm    Post subject: Cuda host array allocation problem. Reply with quote

I'm currently learning PGI CUDA fortran (PGI workstation 10.9, Win XP 32), and made a little test code. It allocates an allocatable array on host. It works well when the size of array is small, however, the allocation fails when the array gets big. Moreover, one of the curious thing is that I don't even call the subroutines.

Also, this happens only when
Code:
attributes(global)
is declared. I'll attach the code and the result.

Is anybody here who knows what happens? Is there a limit in allocating an array, or am I missing anything important? I'll be very glad to hear some.

- Sungjin, Kim.

Code:

module linear_system_cu
  use cudafor

contains
  attributes(global) subroutine jacobi_kernel(a, b, x, x_new, n)
     implicit none
     real, device :: a(n,n), b(n)
     real, device :: x_new(n), x(n)
     integer, value :: n

   end subroutine jacobi_kernel

  subroutine jacobi(a, x, b, tol)
    implicit none
    real, dimension(:,:), intent(in) :: a
    real, dimension(:), intent(inout) :: x
    real, dimension(:), intent(in) :: b
    real, intent(in) :: tol

  end subroutine jacobi

end module linear_system_cu


program alloc
  use linear_system_cu

  implicit none

  real, dimension(:,:,:), allocatable :: a
  integer :: ierr

  write(*,*) "Test 1."

  allocate(a(5, 100, 100), stat=ierr)

  if (ierr /= 0) then
     write(*,*) "Could not allocate a."
  else
     write(*,*) "Allocated a."
  end if

  if (allocated(a)) then
     deallocate(a)
  end if

  write(*,*) "Test 2."

  allocate(a(5, 10000, 10000), stat=ierr)

  if (ierr /= 0) then
     write(*,*) "Could not allocate a."
  else
     write(*,*) "Allocated a."
  end if

end program alloc


The result is;

Code:

PGI$ pgf90 -Mcuda alloc.f90 linear_system_cu.f90
alloc.f90:
linear_system_cu.f90:
PGI$ ./alloc.exe
 Test 1.
 Allocated a.
 Test 2.
 Could not allocate a.


However, if modified like this;

Code:
  subroutine jacobi_kernel(a, b, x, x_new, n)


The result becomes

Code:
PGI$ pgf90 -Mcuda alloc.f90 linear_system_cu.f90
alloc.f90:
linear_system_cu.f90:
PGI$ ./alloc.exe
 Test 1.
 Allocated a.
 Test 2.
 Allocated a.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Thu May 19, 2011 9:15 am    Post subject: Reply with quote

Hi Sungjin, Kim,

The problem here is that your array is simply too big. The maximum size of all user memory in Win32 is 2GB less some memory due to the memory requirements of the OS, so the actual max you can allocate is closer to 1.75GB. You are trying to allocate just over 1.8GB (5x10000x10000x4 bytes). You can have the OS extend the user space to 3GB by passing in the link flag "-Wl,largeaddressaware" and hence allow you to allocate more memory. However, I have never tested this flag with CUDA Fortran so don't know if you'll encounter other issues. I would suggest limiting your memory usage or move to 64-bit Windows.

Note that the program fails for me with or without the "attribute(global)". Why it works for you is most likely just luck. You're just at the board line of memory usage so slight variations in the code could change the behavior.

Hope this helps,
Mat
Back to top
View user's profile
WilliamRae59305



Joined: 25 Aug 2009
Posts: 2

PostPosted: Tue Jul 26, 2011 3:51 pm    Post subject: allocating memory on host Reply with quote

I have just posted in computing and compiling that with a new Tesla 2000 series in TCC mode the whole host memory can probably be addressed from the device without pinning. You need the latest version of the compiler. windows 7, a Tesla 2050 minimum. The Tesla must be in TCC mode. You allocate memory according to the pinned memory model but without the attributes pinned. I have not yet tested for very large arrays but that was what Nvidia designed it for. PGI support said it did not work but I think they did not use a Tesla 2000 series or it was not in TCC mode or they were not using Windows 7 or not using CUDA 4.0. It is not a cheap solution if you have to buy a new card and Windows 7, but the Tesla C2070's have 6GB DDR5 on board and they ard good value for money.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group