PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Problems with the device subprograms

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
OceanCloud



Joined: 27 Nov 2012
Posts: 12

PostPosted: Thu Aug 29, 2013 10:18 pm    Post subject: Problems with the device subprograms Reply with quote

When I use the subroutine with device attribute in CUDA Fortran, I find the device subprogram must be contained in a module and can only be invoked by subroutines or functions in this module.
Is it true?
Why it does in this way?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Sep 03, 2013 12:20 pm    Post subject: Reply with quote

Hi OceanCloud,

Quote:
Is it true?


Yes, in older versions of the compiler and by default in the current version. The main issue is that until recently, there wasn't a linker for device code. Hence, device routines needed to be inlined by the compiler thus required the device routines to be placed in the same module as the global routines. (Note that this was true for CUDA C as well where device routines had to be in the same file scope as the global routines).

As of CUDA 5.0, we now can link device routines found in external objects when using the "-Mcuda=rdc" flag. The following PGinsider article gives a good explanation of its usage: http://www.pgroup.com/lit/articles/insider/v5n1a2.htm



Hope this helps,
Mat


Last edited by mkcolg on Wed Sep 11, 2013 10:14 am; edited 1 time in total
Back to top
View user's profile
OceanCloud



Joined: 27 Nov 2012
Posts: 12

PostPosted: Wed Sep 04, 2013 6:25 pm    Post subject: Reply with quote

Hi, Mat

Thanks a lot.

I read the PGinsider article you mentioned, maybe the compile option is "-Mcuda=rdc" not "-Mcuda=rdo". But I don't quite understand when I use the "-Mcuda=rdc" flag and the "allocate" keyword in device routines, the compiler gives errors as below

"error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Unexpected runtime function call"

Why does this error occur?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Thu Sep 05, 2013 2:49 pm    Post subject: Reply with quote

Quote:
maybe the compile option is "-Mcuda=rdc" not "-Mcuda=rdo".
Correct, this was a typo in my part. I'll go back and edit the post.

Quote:
"error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Unexpected runtime function call"

Why does this error occur?
This typically means that a compiler generated host routine is being added to the device code. The one open bug (TPR#19462) I see with this failure has to with "pow" when "-i8" is used. This will be fixed in 13.9. If that's not the same as yours, can you send a reproducing example to PGI Customer Service (trs@pgroup.com)?

Thanks,
Mat
Back to top
View user's profile
OceanCloud



Joined: 27 Nov 2012
Posts: 12

PostPosted: Tue Sep 10, 2013 8:05 pm    Post subject: Reply with quote

Thanks, Mat

I mean when I test the codes given in the PGinsider article, the codes (dgemmdynamic.cuf, dgemmdynamic_strassen.cuf, dgemmdynamic_streams.cuf)can't compile fine.

Enviroment: PGI Visual Fortran 13.8, Visual Studio 2012, Windows 7 x64
compile option: -Mcuda=cuda5.0,cc35,rdc
GPU card: K20C

Error message:
Code:
dgemmdynamic.cuf
C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor2afqqbp1lEUFtU.gpu(1010): error: identifier "mm88" is undefined

C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor2afqqbp1lEUFtU.gpu(1010): error: identifier "mm28" is undefined

2 errors detected in the compilation of "C:\Users\Adiministrator\AppData\Local\Temp\pgnvd2aGq4bGHw4zbl_.nv0".
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted


dgemmdynamic_strassen.cuf
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2337; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2441; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2545; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 3082; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 3177; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas : fatal error : Ptx assembly aborted due to errors
pgnvd-Fatal-Could not spawn c:\program files\pgi\win64/2013/cuda/5.0/bin\ptxas.exe
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic_strassen.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted

dgemmdynamic_streams.cuf
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c0KubCnuvxO8w.ptx, line 2372; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c0KubCnuvxO8w.ptx, line 3257; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas : fatal error : Ptx assembly aborted due to errors
pgnvd-Fatal-Could not spawn c:\program files\pgi\win64/2013/cuda/5.0/bin\ptxas.exe
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic_streams.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted



The above three routines all contain "allocate" statements, and the dgemmdynamic_strassen.cuf, dgemmdynamic_streams.cuf routines contain dynamic parallelism.

Maybe you can point out where the problem is from the above description.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group