PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Host Pinned Memory Allocation

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
sindimo



Joined: 30 Nov 2010
Posts: 29
Location: Saudi Aramco

PostPosted: Sun Jan 23, 2011 12:40 am    Post subject: Host Pinned Memory Allocation Reply with quote

Dear Mat,

I am trying to get host pinned memory to work on a sample program before implementing it in our actual application.

The program is:
[sindimo@superbeast100]$ cat pinnedMemory2.f
Code:


         module myModule
         contains
         subroutine MM (A,B,C)

         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)
         real start, finish
         real*8 :: A(:,:), B(:,:), C(:,:)

      call cpu_time(start)


!$acc region
        do j = 1, dimm3
        do i = 1, dimm1
          C(i, j) = 0
        enddo
        do k = 1, dimm2
          do i = 1, dimm1
            C(i, j) = C(i, j) + A(i, k)*B(k, j)
          enddo
        enddo
       enddo
!$acc end region


      call cpu_time(finish)

       write(*,*) 'Time ',finish - start,' s'
     
      end subroutine MM
      end module myModule
       



         program main
         use myModule
         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3, seed
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)

         real*8, allocatable, pinned :: A(:,:), B(:,:), C(:,:)
!         real*8, allocatable :: A(:,:), B(:,:), C(:,:)

         allocate( A(dimm1,dimm2), B(dimm2,dimm3), C(dimm1,dimm3) )

          seed=7654321

              !populate 2 random matrices
                do i = 1, dimm1
                do j = 1, dimm2
                  A(i, j) = ran(seed)
               enddo
               enddo
               do i = 1, dimm2
               do j = 1, dimm3
               B(i, j) = ran(seed)
               enddo
               enddo

           do i = 1, 1
             call MM(A,B,C)
           enddo

         end program main


I compile it using the below and I get an error:
[sindimo@superbeast100]$ pgfortran -fast -Mcuda -ta=nvidia,time -Minfo=accel -mcmodel=medium -Minline pinnedMemory2.f
Code:

mm:
     17, Generating copyin(a(1:10000,1:10000))
         Generating copyin(b(1:10000,1:10000))
         Generating copyout(c(1:10000,1:10000))
         Generating compute capability 1.3 binary
     18, Loop is parallelizable
     19, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         19, !$acc do parallel, vector(16)
             CC 1.3 : 6 registers; 24 shared, 44 constant, 0 local memory bytes; 100 occupancy
     22, Loop carried reuse of 'c' prevents parallelization
     23, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         22, !$acc do seq
             Cached references to size [16x16] block of 'a'
             Cached references to size [16x16] block of 'b'
         23, !$acc do parallel, vector(16)
             Using register for 'c'
             CC 1.3 : 23 registers; 4120 shared, 60 constant, 0 local memory bytes; 50 occupancy
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x8ef): In function `main':
./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x9c1):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0xa92):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'


If I remove the "pinned" attribute in the deceleration, it works fine.

What am I missing here? I am already using the -Mcuda flag and "use cudafor".

Thank you for your help.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jan 24, 2011 10:25 am    Post subject: Reply with quote

Hi sindimo,

The problem here is that CUDA Fortran doesn't yet support the medium memory. Removing the flag "-mcmodel=medium" will work around the error. This is a known limitation and has been logged as TPR#16947.

On a side note, mixing CUDA Fortran with the PGI Accelerator Model is not officially supported. Instead, you may wish to use the "cuf" kernel directive. (See the second part of http://www.pgroup.com/lit/articles/insider/v2n3a1.htm)

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group