PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

host, device function: fortran counterpart of _CUDA_ARCH_
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
canemacchina



Joined: 02 May 2012
Posts: 3

PostPosted: Sat May 12, 2012 12:18 pm    Post subject: host, device function: fortran counterpart of _CUDA_ARCH_ Reply with quote

Hi all,

I'm studying CUDA Fortran for my thesis, and I've a question:

Like in C, in Fortran we can define functions (or subroutines) that are both host and device, so functions that are callable from host or device.

In C in this functions if I have to determine who is the caller, I can use _CUDA_ARCH_ macro: it's defined when the caller is the GPU, otherwise it isn't if the caller is the CPU. This is an example of use of _CUDA_ARCH_

Code:

__host__ __device__ void function() {
#ifdef __CUDA_ARCH__
        //__CUDA_ARCH_ defined, GPU is the caller
#if __CUDA_ARCH__ >= 200
   //Compute capability >= 2.x
#elif __CUDA_ARCH__ < 200
   //Compute capability < 2.x
#endif
#else
   //_CUDA_ARCH_ not defined, host call the function
#endif
}


So, the question is: in Fortran there is something like this?
Thanks to all!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon May 14, 2012 9:38 am    Post subject: Reply with quote

Hi canemacchina

Quote:
in Fortran there is something like this?
While "_CUDA_ARCH_" is not a predefined macro variable, CUDA Fortran supports preprocessing so you can use it and then set the variable on the command line (i.e. -D__CUDA_ARCH__=200). Note that "_CUDA" is pre-defined when "-Mcuda" is used or the file extension is ".cuf".

Though, in the case of CUDA Fortran, you may not want to use this method to select the compute capability. The PGI compiler will automatically create multiple versions of your code to target the various compute capabilities. When the binary is run, then the appropriate CC version is used.

The one caveat is that all CC version will use the same user code with the difference being the compiler optimisation applied. If your device kernels are significantly different, then you may want to go the route of selecting the kernel based on the CC version. However, why do it at compile time? It seems to me that you'd want to wait till runtime, call the device properties routine, and then select the appropriate kernel to launch.

- Mat
Back to top
View user's profile
canemacchina



Joined: 02 May 2012
Posts: 3

PostPosted: Tue May 15, 2012 7:17 am    Post subject: Reply with quote

Ok, thanks.

Maybe I haven't well understand, but my question was:
suppose you have to write some utils function for your application, for example one that sort a given array. Suppose you want to use this function both for device or host array. Well, you could do this implementing two different functions, for example dev_arraySort and host_arraySort, that sort an array respectively on device memory or on host memory. So in this case if a kernel sub wants to order an array, it has to call dev_arraySort, and if is the host that would sort an array, it has to call host_arraySort.
My goal instead is write a function called arraySort that is callable both from host and device, able to sort an array in device memory and in host memory. Like i've write before, in C I can write:

Code:

__host__ __device__ void function() {
#ifdef __CUDA_ARCH__
    //__CUDA_ARCH_ defined, GPU is the caller
    // here the code to sort an array in dev mem.
#else
   //_CUDA_ARCH_ not defined, host call the function
   // here the code to sort an array in host mem.
#endif
}


maybe is a strange example or strange way to solve my problem, but I need this information to write about this in my thesis.

Is it possible in fortran or not?

Thanks again.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue May 15, 2012 8:47 am    Post subject: Reply with quote

Hi canemacchina,

Define flags such as "__CUDA_ARCH__" are used by the preprocessor to control conditional compilation. So this code can be compiled with the call to the device or the host, but not both.

What you need to do is add a runtime call to the device properties to determine what type of device you are using and then use an if statement to call the appropriate routine. You could also write a generic interface that selects the appropriate version of the routine, based on the argument types, but ultimately there are two different versions of the routine.

In the first draft of CUDA Fortran we did have the concept of a "Unified Binary" where two versions of the kernel would be written, one for the device and one for the host, but the implementation proved too difficult. Hopefully we will be able to add it back at some point.

Note that the PGI Accelerator Model and the PGI implementation of OpenACC API does support unified binary. If your project absolutely requires a single routine that supports both the device and host, then you may wish to consider using directives instead.

- Mat
Back to top
View user's profile
canemacchina



Joined: 02 May 2012
Posts: 3

PostPosted: Tue May 15, 2012 11:20 am    Post subject: Reply with quote

Sorry, I haven't understood yet.
Quote:

Define flags such as "__CUDA_ARCH__" are used by the preprocessor to control conditional compilation. So this code can be compiled with the call to the device or the host, but not both.

Is referred to fortran, right?

Quote:

What you need to do is add a runtime call to the device properties to determine what type of device you are using and then use an if statement to call the appropriate routine.


Ok, but with this solution I can determine compute capability of the device, but what I need is determine if the caller is the host or the gpu...
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group