PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI ACC release 11.0: Multiple GPUs using openmp

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
pengc



Joined: 04 Dec 2010
Posts: 2

PostPosted: Mon Jan 03, 2011 4:29 pm    Post subject: PGI ACC release 11.0: Multiple GPUs using openmp Reply with quote

i have here a very simple test program for my workstation with two GPU cards:
Code:

#include <stdio.h>
#include<stdlib.h>
#include <accel.h>

#define N  128

int main(int argc, char* argv) {

   int i;
   
   double *restrict a;
   a=(double *restrict)malloc(N*sizeof(double));
   omp_set_num_threads(2);
   
#pragma omp parallel
{
   int th_id;
      th_id = omp_get_thread_num();
      printf("Hello World from thread %d\n", th_id);
}
#pragma acc region
{

   for (i = 0; i < N; i++) {
         a[i]*=2.0;
   }
}
return 0;
}


it was than compiled using the command:
Code:

pgcc test.c -o test -mp -fast -Minfo -ta=nvidia,time


compiler(pgi acc compiler release 11.0) output without errors or warnings:
Code:

main:
     16, Parallel region activated
     21, Parallel region terminated
         Generating copy(a[0:127])
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     24, Loop is parallelizable
         Accelerator kernel generated
         24, #pragma acc for parallel, vector(128) /* blockIdx.x threadIdx.x */
             CC 1.3 : 4 registers; 28 shared, 4 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 8 registers; 4 shared, 40 constant, 0 local memory bytes; 66% occupancy


when i run the binary => segmentation fault

could anyone tell me what might be the problem.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jan 03, 2011 5:59 pm    Post subject: Reply with quote

Hi pengc,

I tried your code and it worked fine for me. (See below). Does it still fail if you remove the OpenMP code? Does it fail if you remove the accelerator region? What is the output of the command 'pgaccelinfo'?

- Mat

Code:
% pgcc test.c -o test1.out -mp -fast -Minfo -ta=nvidia,time -V11.0 ; test1.out
main:
     16, Parallel region activated
     21, Parallel region terminated
         Generating copy(a[0:127])
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     24, Loop is parallelizable
         Accelerator kernel generated
         24, #pragma acc for parallel, vector(128) /* blockIdx.x threadIdx.x */
             CC 1.3 : 4 registers; 28 shared, 4 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 8 registers; 4 shared, 40 constant, 0 local memory bytes; 66% occupancy
Hello World from thread 0
Hello World from thread 1

Accelerator Kernel Timing data
/tmp/qa/test.c
  main
    21: region entered 1 time
        time(us): total=139074 init=138626 region=448
                  kernels=22 data=49
        w/o init: total=448 max=448 min=448 avg=448
        24: kernel launched 1 times
            grid: [1]  block: [128]
            time(us): total=22 max=22 min=22 avg=22
Back to top
View user's profile
pengc



Joined: 04 Dec 2010
Posts: 2

PostPosted: Tue Jan 04, 2011 12:21 am    Post subject: problem is still there Reply with quote

Thanks Mat! but it still fail
output of pgaccelinfo:
Code:

-bash-3.2$ pgaccelinfo
CUDA Driver Version:           3020

Device Number:                 0
Device Name:                   GeForce GTX 480
Device Revision Number:        2.0
Global Memory Size:            1610285056
Number of Multiprocessors:     15
Number of Cores:               480
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           32768
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1401 MHz
Initialization time:           1171282 microseconds
Current free memory:           1501691904
Upload time (4MB):             1030 microseconds ( 722 ms pinned)
Download time:                 1418 microseconds (1176 ms pinned)
Upload bandwidth:              4072 MB/sec (5809 MB/sec pinned)
Download bandwidth:            2957 MB/sec (3566 MB/sec pinned)

Device Number:                 1
Device Name:                   GeForce GTX 285
Device Revision Number:        1.3
Global Memory Size:            1073545216
Number of Multiprocessors:     30
Number of Cores:               240
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           16384
Warp Size:                     32
Maximum Threads per Block:     512
Maximum Block Dimensions:      512, 512, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             256B
Clock Rate:                    1476 MHz
Initialization time:           1171282 microseconds
Current free memory:           1032916736
Upload time (4MB):             1147 microseconds ( 799 ms pinned)
Download time:                 2230 microseconds (2028 ms pinned)
Upload bandwidth:              3656 MB/sec (5249 MB/sec pinned)
Download bandwidth:            1880 MB/sec (2068 MB/sec pinned)
CAL version:                   1.4-553
No attached CAL devices


if i remove the openmp region:

Code:


  main
    16: region entered 1 time
        time(us): total=2082729 init=2082413 region=316
                  kernels=42 data=26
        w/o init: total=316 max=316 min=316 avg=316
        19: kernel launched 1 times
            grid: [1]  block: [128]
            time(us): total=42 max=42 min=42 avg=42



if i remove the acc region:

Code:

pgcc test.c -o test1.out -mp -fast -Minfo -ta=nvidia,time -V11.0 ; ./test1.out
main:
     16, Parallel region activated
     22, Parallel region terminated
Hello World from thread 1
Hello World from thread 0



it still fail if have both regions. Should there be some conflict?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group