PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

new compiler gives error
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
amitamritkar



Joined: 02 Oct 2009
Posts: 13

PostPosted: Fri Feb 22, 2013 10:41 am    Post subject: new compiler gives error Reply with quote

Hi,

I have been using fortran compiler version 10.6 to run my linear solver on tesla GPUs. Recently, I installed the trial version of the latest compiler 13.2 and now the very same code is giving not enough memory errors for different problems which I ran with 10.6.
The total memory used by code when run on cpu alone is about 300 MB so I fail to see why the tesla c2050 GPU would run out of memory.

Code:

0: ALLOCATE: 2299968 bytes requested; not enough memory: 30(unknown error)

0: ALLOCATE: 1823360 bytes requested; not enough memory: 4(unspecified launch failure)


I do some operations on GPU before these errors occur and the code where I get these errors is,
Code:

      real,device,dimension(ni,nj,nk,nb)::d_diff,d_phi


If I make the arrays allocatable then only the 'not enough memory' part of the error goes away and the code exits at the allocation statement.

Is there something wrong that I am doing here or is it an issue with the latest compiler.

Thanks,
Amit
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6141
Location: The Portland Group Inc.

PostPosted: Fri Feb 22, 2013 11:33 am    Post subject: Reply with quote

Hi Amit,

The newer PGI CUDA Fortran versions do use newer CUDA (4.2, 5.0) which also need newer CUDA Driver versions install. My best guess is that you simply need to update your driver.

What is the output from the 'pgaccelinfo' utility? It will tell us what driver version you have installed.

- Mat
Back to top
View user's profile
amitamritkar



Joined: 02 Oct 2009
Posts: 13

PostPosted: Fri Feb 22, 2013 11:42 am    Post subject: Reply with quote

Hi Matt, the output from pgaccelinfo utility is as follows,

Quote:

CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.54 Sat Sep 29 00:05:49 PDT 2012

CUDA Device Number: 0
Device Name: Tesla C2050
Device Revision Number: 2.0
Global Memory Size: 2817982464
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1500 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 5746011 microseconds
Current free memory: 2755256320
Upload time (4MB): 1342 microseconds ( 805 ms pinned)
Download time: 1150 microseconds ( 933 ms pinned)
Upload bandwidth: 3125 MB/sec (5210 MB/sec pinned)
Download bandwidth: 3647 MB/sec (4495 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc20

CUDA Device Number: 1
Device Name: Tesla C2050
Device Revision Number: 2.0
Global Memory Size: 2817982464
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1500 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 5746011 microseconds
Current free memory: 2755256320
Upload time (4MB): 1503 microseconds ( 804 ms pinned)
Download time: 1211 microseconds ( 933 ms pinned)
Upload bandwidth: 2790 MB/sec (5216 MB/sec pinned)
Download bandwidth: 3463 MB/sec (4495 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc20
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6141
Location: The Portland Group Inc.

PostPosted: Fri Feb 22, 2013 11:52 am    Post subject: Reply with quote

Nope, I'm wrong. That's a current driver.

What happens if you run in debug mode in emulation (-Mcuda=emu)? Do you see similar or the problems?

If not, then I'll need to see a reproducing example to determine what's wrong. If it's too big to post, please send a note to PGI Customer Service (trs@pgroup.com) and ask them to forward the example to me.

Thanks,
Mat
Back to top
View user's profile
amitamritkar



Joined: 02 Oct 2009
Posts: 13

PostPosted: Fri Feb 22, 2013 12:43 pm    Post subject: emulator mode Reply with quote

Hi Mat,

I get 100s of such warnings in the emulator mode and it finally fails.
The code does go beyond the point where it failed on the GPU though.

Quote:
Warning: Number of emulated threads (14) is less than available cpus (24)
Warning: Number of emulated threads (14) is less than available cpus (24)
Error: _mp_task_yield/_mp_task_sync does not work in this case
a region with one thread
a nested task
an immediate task


Thanks,
Amit
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group