|
| View previous topic :: View next topic |
| Author |
Message |
spam.me
Joined: 25 Jun 2009 Posts: 6
|
Posted: Tue Jul 14, 2009 6:05 am Post subject: double precision |
|
|
Hello,
if I compile the code below with "double" type I obtain the following error message
| Code: |
pgcc -o c2.exe c2.c -ta=nvidia,cc11 -Minfo=accel -fast
NOTE: your trial license will expire in 0 days, 9.13 hours.
main:
60, Accelerator region ignored
63, Accelerator restriction: invalid loop
64, Accelerator restriction: datatype not supported: s
|
What can I do? Thanks a lot!
In [1] one can read that double-precision is supported and CUDA 64Bit code can be "looped" on 32Bit processors. I use pgi9-0.2 and the graphic card details reads as follows
| Code: |
pgaccelinfo
Device Number: 0
Device Name: GeForce 9800 GTX/9800 GTX+
Device Revision Number: 1.1
Global Memory Size: 536150016
Number of Multiprocessors: 16
Number of Cores: 128
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 8192
Warp Size: 32
Maximum Threads per Block: 8192
Maximum Block Dimensions: 512 x 512 x 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 262144B
Texture Alignment 256B
Clock Rate: 1674 MHz
|
[1] http://www.pgroup.com/resources/accel.htm#dp
The souces reads as follows:
| Code: |
typedef double dtype;
int main( int argc, char* argv[] )
{
int n; /* size of the vector */
dtype *a; /* the vector */
dtype *r; /* the results */
dtype *e; /* expected results */
dtype s, c;
struct timeval t1, t2, t3;
long cgpu, chost;
int i,*t;
if( argc > 1 )
n = atoi( argv[1] );
else
n = 100000;
if( n <= 0 ) n = 100000;
a = (dtype*)malloc(n*sizeof(dtype));
r = (dtype*)malloc(n*sizeof(dtype));
e = (dtype*)malloc(n*sizeof(dtype));
for( i = 0; i < n; ++i ) a[i] = (dtype)(i+1) * 2.0;
acc_init( acc_device_nvidia );
gettimeofday( &t1, NULL );
#pragma acc region
{
for( i = 0; i < n; ++i ){
s = sin(a[i]);
c = cos(a[i]);
r[i] = s*s + c*c;
}
}
gettimeofday( &t2, NULL );
}
|
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Jul 14, 2009 8:44 am Post subject: |
|
|
Hi spam.me,
While I get different errors then you, I'm able work around them by adding header files and using the "-Msafeptr" flag. Note that "double" is supported however, we currently only support 64-bit Linux host objects.
| Code: | cat cc.c
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/time.h>
#include <math.h>
#include <accel.h>
#include <accelmath.h>
typedef double dtype;
int main( int argc, char* argv[] )
{
int n; /* size of the vector */
dtype *a; /* the vector */
dtype *r; /* the results */
dtype *e; /* expected results */
dtype s, c;
struct timeval t1, t2, t3;
long cgpu, chost;
int i,*t;
if( argc > 1 )
n = atoi( argv[1] );
else
n = 100000;
if( n <= 0 ) n = 100000;
a = (dtype*)malloc(n*sizeof(dtype));
r = (dtype*)malloc(n*sizeof(dtype));
e = (dtype*)malloc(n*sizeof(dtype));
for( i = 0; i < n; ++i ) a[i] = (dtype)(i+1) * 2.0;
acc_init( acc_device_nvidia );
gettimeofday( &t1, NULL );
#pragma acc region
{
for( i = 0; i < n; ++i ){
s = sin(a[i]);
c = cos(a[i]);
r[i] = s*s + c*c;
}
}
gettimeofday( &t2, NULL );
}
% pgcc -ta=nvidia -Minfo=accel -fast cc.c -Msafeptr -o cc.exe
main:
39, Generating copyin(a[0:n-1])
Generating copyout(r[0:n-1])
41, Loop is parallelizable
Accelerator kernel generated
41, #pragma for parallel, vector(256)
Using register for a
% cc.exe
%
|
|
|
| Back to top |
|
 |
spam.me
Joined: 25 Jun 2009 Posts: 6
|
Posted: Tue Jul 14, 2009 8:59 am Post subject: |
|
|
Thanks for your fast repley.
Of course, the header are included in the source file and pgi runs under a 64-bit Linux system (current XUbuntu)
But please note, the compiler error occurs if I compile the source by
| Code: |
pgcc -ta=nvidia,cc11 ...
|
If I apply
| Code: |
pgcc -ta=nvidia,cc13
|
the compilation is fine, but I can't run the program, which means I obtain the following error message
| Code: |
call to cuModuleLoadData returned error 300: Invalid Source
|
| mkcolg wrote: | Hi spam.me,
While I get different errors then you, I'm able work around them by adding header files and using the "-Msafeptr" flag. Note that "double" is supported however, we currently only support 64-bit Linux host objects.
| Code: |
% pgcc -ta=nvidia -Minfo=accel -fast cc.c -Msafeptr -o cc.exe
main:
39, Generating copyin(a[0:n-1])
Generating copyout(r[0:n-1])
41, Loop is parallelizable
Accelerator kernel generated
41, #pragma for parallel, vector(256)
Using register for a
% cc.exe
%
|
|
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Jul 14, 2009 11:28 am Post subject: |
|
|
Hi spam.me,
Sorry my misunderstanding. As far as I can tell, NVIDIA did not support double precision until compute capability version 1.3 so you will not be able to use "double" with your card.
- Mat |
|
| Back to top |
|
 |
spam.me
Joined: 25 Jun 2009 Posts: 6
|
Posted: Wed Jul 15, 2009 2:51 am Post subject: |
|
|
Dear mkcolg,
yes my test graphic card has no 64-Bit processors, but in CUDA it is possible to "loop" 64Bit operations in a manner that the 32Bit processors can compute the problem. Of course doubles-processes on 32Bit processors runs slow but I want to run some benchmarks before I charge my university for some Tesla Cards.
Is it planned to include that those 64-to32-loop options in the compiler/pragmas or they are already? |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|