PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

CUDA Fortran and PGI Accelerator mix
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Tue May 17, 2011 2:42 pm    Post subject: CUDA Fortran and PGI Accelerator mix Reply with quote

Greetings. Is the mix of CUDA Fortran (-Mcuda) with the PGI Accelerator Model (-ta=nvidia) supported? I saw a post from April 2010 that they shouldn't be used together, but at some point they may.

Thanks
BL
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Wed May 18, 2011 8:10 am    Post subject: Reply with quote

Hi BL,

Yes, they are now supported together. At one point they were using different CUDA APIs but we have since merged them so that they are now compatible on all platforms (I did work on Linux before but not on Windows). Note that the accelerator directives do recognize CUDA Fortran device variables so don't copy these variables. We also added a "!$CUF" directive to CUDA Fortran (See:http://www.pgroup.com/lit/articles/insider/v2n3a1.htm) which is essentially a 'lite' version of the PGI Accelerator model. It does not automate data movement but does create device kernels for you. It also uses the CUDA chevron syntax to give you control of the loop schedule.

Hope this helps,
Mat
Back to top
View user's profile
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Wed May 18, 2011 1:21 pm    Post subject: Reply with quote

Thanks! That's great to hear.
I tried to test mixing cuda fortran and pgi accelerator directives. The code shown compiles fine but I get an error at runtime. I'm using Windows.
Code:

program fft_test
use cudafor
use precision
use cufft
complex(fp_kind) ,allocatable:: a(:),b(:),c(:)
complex(fp_kind),device,allocatable:: a_d(:),b_d(:)
integer:: n
integer:: plan

n=8

! allocate arrays on the host
allocate (a(n),b(n),c(n))

! allocate arrays on the device
allocate (a_d(n))
allocate (b_d(n))

!initialize arrays on host
a=1;c=0

!copy arrays to device
a_d=a


! Print initial array
print *, "Array A:"
print *, a



! Initialize the plan
call cufftPlan1D(plan,n,CUFFT_Z2Z,1)

! Execute FFTs
call cufftExecZ2Z(plan,a_d,b_d,CUFFT_FORWARD)

!call cufftExecZ2Z(plan,b_d,b_d,CUFFT_INVERSE)


! Copy results back to host
b=b_d

! Print initial array
print *, "Array B"
print *, b

! Add arrays
!$acc region
do j=1,n
c(j)=a(j)+b(j)
enddo
!$acc end region
print *, "Array C"
print *, c

!release memory on the host
deallocate (a,b,c)

!release memory on the device
deallocate (a_d,b_d)

! Destroy the plan
call cufftDestroy(plan)

end program fft_test

This is the compile output
Code:

>pgf90 precision.f90 cufft.f90 fft_test.f90 -o main -Mcuda=cuda3.2 -ta=nvidia:cuda3.2 -Minfo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\x64\cufft.lib"
precision.f90:
cufft.f90:
fft_test.f90:
fft_test:
     20, Memory set idiom, array assignment replaced by call to pgf90_msetz16
         Memory zero idiom, array assignment replaced by call to pgf90_mzeroz16
     49, Generating copyin(b(1:8))
         Generating copyin(a(1:8))
         Generating copyout(c(1:8))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     50, Loop is parallelizable
         Accelerator kernel generated
         50, !$acc do parallel, vector(8) ! blockidx%x threadidx%x
             CC 1.3 : 11 registers; 52 shared, 4 constant, 0 local memory bytes;
 25% occupancy
             CC 2.0 : 18 registers; 4 shared, 64 constant, 0 local memory bytes;
 16% occupancy

This is the output
Code:

 Array A:
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 Array B
 (8.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
call to cuMemAlloc returned error 201: Invalid context
CUDA driver version: 3020


This error seems to occur when the accelerated region is entered. What does this error mean? I first compiled without specifying cuda3.2 and thought that was causing a mismatch.

Regards
BL
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Wed May 18, 2011 2:07 pm    Post subject: Reply with quote

Hi BL,

You're missing an interface for the CUFFT routines. Without an interface, the compiler must treat the calls using F77 calling semantics which are incorrect here.

Take a look at this article from the latest PGInsider (http://www.pgroup.com/lit/articles/insider/v3n1a5.htm), which shows how to call the CUBLAS, CULA, and Magma BLAS libraries. The same methods can be used to call CUFFT.

Hope this helps,
Mat
Back to top
View user's profile
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Thu May 19, 2011 7:16 am    Post subject: Reply with quote

Hello Mat,
I am using an interface for the CUFFT library. The output for array b shows that the call to the cufft routine was successful. Array b is the transform of array a. It is when the program enters the !$acc region that I get the error "call to cuMemAlloc returned error 201: Invalid context".

Is this still due to an interface problem?

Thanks
BL
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group