PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

OpenMP, OpenACC and acc_set_device_num
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Thu Feb 28, 2013 9:39 am    Post subject: OpenMP, OpenACC and acc_set_device_num Reply with quote

Hi!

I'm doing some test with openACC combined with openMP in order to use multiple GPU devices but i get some problem during the execution of the program
Code:
        #pragma omp parallel num_threads(2)
        {
                int th= omp_get_thread_num();
#if _OPENACC
                acc_set_device_num(th,acc_device_nvidia);
#endif
                fprintf(stdout,"THREAD(%d) - Launched thread.\n",th);
                fprintf(stdout,"THREAD(%d) - Device selected: %d\n",th,acc_get_device_num(acc_device_nvidia));


And the result is:
Quote:
THREAD(0) -Launched thread.
THREAD(0) - Device selected: 0
THREAD(1) - Launched thread.
THREAD(1) - Device selected: 0


Seems to me that the 'acc_set_device_num' is not working, the program always is running the device from my ACC_DEVICE_NUM environment variable
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Feb 28, 2013 12:14 pm    Post subject: Reply with quote

Hi Neldan,

I'm not sure since your code seems to work fine for me:

Code:
THREAD(0) - Launched thread.
THREAD(1) - Launched thread.
THREAD(0) - Device selected: 0
THREAD(1) - Device selected: 1


What's the output from the command "pgaccelinfo"? What compiler version are you using?

- Mat
Back to top
View user's profile
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Fri Mar 01, 2013 3:51 am    Post subject: Reply with quote

mkcolg wrote:

What compiler version are you using?

PGI Release 12.4-0

mkcolg wrote:

What's the output from the command "pgaccelinfo"?


Quote:
CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.54 Sat Sep 29 00:05:49 PDT 2012

Device Number: 0
Device Name: GeForce GTX 580
Device Revision Number: 2.0
Global Memory Size: 1610285056
Number of Multiprocessors: 16
Number of Cores: 512
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1544 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 2004 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Initialization time: 310428 microseconds
Current free memory: 1542184960
Upload time (4MB): 2385 microseconds ( 732 ms pinned)
Download time: 1530 microseconds ( 694 ms pinned)
Upload bandwidth: 1758 MB/sec (5729 MB/sec pinned)
Download bandwidth: 2741 MB/sec (6043 MB/sec pinned)

Device Number: 1
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5636554752
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 310428 microseconds
Current free memory: 5570056192
Upload time (4MB): 2278 microseconds ( 713 ms pinned)
Download time: 1428 microseconds ( 697 ms pinned)
Upload bandwidth: 1841 MB/sec (5882 MB/sec pinned)
Download bandwidth: 2937 MB/sec (6017 MB/sec pinned)

Device Number: 2
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5636554752
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 310428 microseconds
Current free memory: 5570027520
Upload time (4MB): 1860 microseconds ( 899 ms pinned)
Download time: 1323 microseconds (1040 ms pinned)
Upload bandwidth: 2255 MB/sec (4665 MB/sec pinned)
Download bandwidth: 3170 MB/sec (4032 MB/sec pinned)

Device Number: 3
Device Name: GeForce GTX 460
Device Revision Number: 2.1
Global Memory Size: 1073414144
Number of Multiprocessors: 7
Number of Cores: 224
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1350 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 1800 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Initialization time: 310428 microseconds
Current free memory: 1039273984
Upload time (4MB): 1500 microseconds ( 722 ms pinned)
Download time: 1294 microseconds ( 695 ms pinned)
Upload bandwidth: 2796 MB/sec (5809 MB/sec pinned)
Download bandwidth: 3241 MB/sec (6034 MB/sec pinned)
Back to top
View user's profile
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Tue Mar 05, 2013 9:13 am    Post subject: Reply with quote

any ideas about what I can be doing wrong?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue Mar 05, 2013 10:02 am    Post subject: Reply with quote

Quote:
any ideas about what I can be doing wrong?
I'm not sure. Can you post or send to PGI Customer Service (trs@pgroup.com) a complete example?

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group