PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

double precision
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
spam.me



Joined: 25 Jun 2009
Posts: 6

PostPosted: Tue Jul 14, 2009 6:05 am    Post subject: double precision Reply with quote

Hello,

if I compile the code below with "double" type I obtain the following error message
Code:

pgcc -o c2.exe c2.c -ta=nvidia,cc11 -Minfo=accel -fast
NOTE: your trial license will expire in 0 days, 9.13 hours.
main:
     60, Accelerator region ignored
     63, Accelerator restriction: invalid loop
     64, Accelerator restriction: datatype not supported: s


What can I do? Thanks a lot!

In [1] one can read that double-precision is supported and CUDA 64Bit code can be "looped" on 32Bit processors. I use pgi9-0.2 and the graphic card details reads as follows

Code:

pgaccelinfo
Device Number:                 0
Device Name:                   GeForce 9800 GTX/9800 GTX+
Device Revision Number:        1.1
Global Memory Size:            536150016
Number of Multiprocessors:     16
Number of Cores:               128
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           8192
Warp Size:                     32
Maximum Threads per Block:     8192
Maximum Block Dimensions:      512 x 512 x 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          262144B
Texture Alignment              256B
Clock Rate:                    1674 MHz


[1] http://www.pgroup.com/resources/accel.htm#dp
The souces reads as follows:

Code:

typedef double dtype;

int main( int argc, char* argv[] )
{
    int n;      /* size of the vector */

    dtype *a;  /* the vector */
    dtype *r;  /* the results */
    dtype *e;  /* expected results */

    dtype s, c;
    struct timeval t1, t2, t3;

  long cgpu, chost;
    int i,*t;
    if( argc > 1 )
        n = atoi( argv[1] );
    else
        n = 100000;
    if( n <= 0 ) n = 100000;

    a = (dtype*)malloc(n*sizeof(dtype));
    r = (dtype*)malloc(n*sizeof(dtype));
    e = (dtype*)malloc(n*sizeof(dtype));

    for( i = 0; i < n; ++i ) a[i] = (dtype)(i+1) * 2.0;
    acc_init( acc_device_nvidia );

    gettimeofday( &t1, NULL );
    #pragma acc region
    {
        for( i = 0; i < n; ++i ){
            s = sin(a[i]);
            c = cos(a[i]);
            r[i] = s*s + c*c;
        }
    }
    gettimeofday( &t2, NULL );
}

Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Tue Jul 14, 2009 8:44 am    Post subject: Reply with quote

Hi spam.me,

While I get different errors then you, I'm able work around them by adding header files and using the "-Msafeptr" flag. Note that "double" is supported however, we currently only support 64-bit Linux host objects.

Code:
cat cc.c
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/time.h>
#include <math.h>
#include <accel.h>
#include <accelmath.h>


typedef double dtype;

int main( int argc, char* argv[] )
{
    int n;      /* size of the vector */

    dtype *a;  /* the vector */
    dtype *r;  /* the results */
    dtype *e;  /* expected results */

    dtype s, c;
    struct timeval t1, t2, t3;

  long cgpu, chost;
    int i,*t;
    if( argc > 1 )
        n = atoi( argv[1] );
    else
        n = 100000;
    if( n <= 0 ) n = 100000;

    a = (dtype*)malloc(n*sizeof(dtype));
    r = (dtype*)malloc(n*sizeof(dtype));
    e = (dtype*)malloc(n*sizeof(dtype));

    for( i = 0; i < n; ++i ) a[i] = (dtype)(i+1) * 2.0;
    acc_init( acc_device_nvidia );

    gettimeofday( &t1, NULL );
    #pragma acc region
    {
        for( i = 0; i < n; ++i ){
            s = sin(a[i]);
            c = cos(a[i]);
            r[i] = s*s + c*c;
        }
    }
    gettimeofday( &t2, NULL );
}
% pgcc -ta=nvidia -Minfo=accel -fast cc.c -Msafeptr -o cc.exe
main:
     39, Generating copyin(a[0:n-1])
         Generating copyout(r[0:n-1])
     41, Loop is parallelizable
         Accelerator kernel generated
         41, #pragma for parallel, vector(256)
             Using register for a
% cc.exe
%       
Back to top
View user's profile
spam.me



Joined: 25 Jun 2009
Posts: 6

PostPosted: Tue Jul 14, 2009 8:59 am    Post subject: Reply with quote

Thanks for your fast repley.

Of course, the header are included in the source file and pgi runs under a 64-bit Linux system (current XUbuntu)

But please note, the compiler error occurs if I compile the source by
Code:

pgcc -ta=nvidia,cc11 ...

If I apply
Code:

pgcc -ta=nvidia,cc13

the compilation is fine, but I can't run the program, which means I obtain the following error message
Code:

call to cuModuleLoadData returned error 300: Invalid Source


mkcolg wrote:
Hi spam.me,

While I get different errors then you, I'm able work around them by adding header files and using the "-Msafeptr" flag. Note that "double" is supported however, we currently only support 64-bit Linux host objects.

Code:

% pgcc -ta=nvidia -Minfo=accel -fast cc.c -Msafeptr -o cc.exe
main:
     39, Generating copyin(a[0:n-1])
         Generating copyout(r[0:n-1])
     41, Loop is parallelizable
         Accelerator kernel generated
         41, #pragma for parallel, vector(256)
             Using register for a
% cc.exe
%       
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Tue Jul 14, 2009 11:28 am    Post subject: Reply with quote

Hi spam.me,

Sorry my misunderstanding. As far as I can tell, NVIDIA did not support double precision until compute capability version 1.3 so you will not be able to use "double" with your card.

- Mat
Back to top
View user's profile
spam.me



Joined: 25 Jun 2009
Posts: 6

PostPosted: Wed Jul 15, 2009 2:51 am    Post subject: Reply with quote

Dear mkcolg,

yes my test graphic card has no 64-Bit processors, but in CUDA it is possible to "loop" 64Bit operations in a manner that the 32Bit processors can compute the problem. Of course doubles-processes on 32Bit processors runs slow but I want to run some benchmarks before I charge my university for some Tesla Cards.

Is it planned to include that those 64-to32-loop options in the compiler/pragmas or they are already?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group