PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

-ta nvidia,host and libcuda.so requirement

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
noe



Joined: 14 Sep 2004
Posts: 4

PostPosted: Fri Nov 27, 2009 4:42 am    Post subject: -ta nvidia,host and libcuda.so requirement Reply with quote

Hi,

I was trying version 10.0 to create unified binaries that run with
or without accelarator.
It seems however, that libcuda.so is required in any case (even
with ACC_DEVICE=host) and must be installed also on platforms
without accelerator.
Also libcuda.so is not shipped with the compiler. Under openSUSE,
for instance, it's part of the video driver package, which would not
usually be installed without accelerator.

Is this the intended behaviour ? Or shouldn't the runtime system
try to avoid using any CUDA libraries when apparently no
accelerator is present or wanted.

Also, copying libcuda.so to some place listed in $LD_LIBRARY_PATH
does not help, I then get the error
call to cuInit returned error 100: No device


Regards,
Norbert
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Mon Nov 30, 2009 12:43 pm    Post subject: Reply with quote

Hi Norbet,

I just double checked, and did not have any problems when I created a unified binary on a system with a NVIDIA GPU and then ran it on another without a GPU. I was only able to recreate your error when I compiled with just "-ta=nvidia". Can you please double check that you compiled with "-ta=nvidia,host"?

Thanks,
Mat
Back to top
View user's profile
noe



Joined: 14 Sep 2004
Posts: 4

PostPosted: Tue Dec 01, 2009 2:16 am    Post subject: Reply with quote

Ok, one by one. I was actually using one of the official examples:

hostA% pgaccelinfo | grep 'Device Name'
Device Name: Tesla C1060
Device Name: Tesla C1060
hostA% pgfortran -o f2.uni f2.f90 -ta=nvidia,host -Minfo -fast
main:
1, PGI Unified Binary version for -tp=nehalem-64 -ta=host
20, Unrolled inner loop 8 times
26, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
32, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
38, Loop not vectorized/parallelized: contains call
main:
1, PGI Unified Binary version for -tp=nehalem-64 -ta=nvidia
20, Unrolled inner loop 8 times
25, Generating copyin(a(1:n))
Generating copyout(r(1:n))
26, Loop is parallelizable
Accelerator kernel generated
26, !$acc do parallel, vector(256)
32, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
38, Loop not vectorized/parallelized: contains call
hostA% ./f2.uni
100000 iterations completed
1230 microseconds on GPU
1482 microseconds on host

Now hostB, same directory, same environment but no cuda installed (and no accelerator HW):

hostB% pgaccelinfo | grep 'Device Name'
hostB% ./f2.uni
libcuda.so not found, exiting
hostB% ACC_DEVICE=host ./f2.uni
libcuda.so not found, exiting

At this point I noticed that I had not mentioned fortran in my
original post and Mat is probably using C.
So same test with a C example:

....
hostB% ./c2.uni
100000 iterations completed
1546 microseconds on GPU
1530 microseconds on host
hostB%

Aaaah. so it's probably a fortran runtime problem.
Also interesting: hostB reports having spent some time on the non-existant GPU. But that's off-topic.

Norbert
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Tue Dec 01, 2009 9:38 am    Post subject: Reply with quote

Hi Norbert,

The example code you are using has the following line:
Code:
  call acc_init( acc_device_nvidia )

In other words, by using this runtime call, the code is forcing the use of the NVIDIA device. Changing acc_init to use "acc_device_default" will allow you to use the unified binary.

Note that the c2 C example has the same issue.

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group