|
| View previous topic :: View next topic |
| Author |
Message |
amitamritkar
Joined: 02 Oct 2009 Posts: 11
|
Posted: Fri Feb 22, 2013 10:41 am Post subject: new compiler gives error |
|
|
Hi,
I have been using fortran compiler version 10.6 to run my linear solver on tesla GPUs. Recently, I installed the trial version of the latest compiler 13.2 and now the very same code is giving not enough memory errors for different problems which I ran with 10.6.
The total memory used by code when run on cpu alone is about 300 MB so I fail to see why the tesla c2050 GPU would run out of memory.
| Code: |
0: ALLOCATE: 2299968 bytes requested; not enough memory: 30(unknown error)
0: ALLOCATE: 1823360 bytes requested; not enough memory: 4(unspecified launch failure)
|
I do some operations on GPU before these errors occur and the code where I get these errors is,
| Code: |
real,device,dimension(ni,nj,nk,nb)::d_diff,d_phi
|
If I make the arrays allocatable then only the 'not enough memory' part of the error goes away and the code exits at the allocation statement.
Is there something wrong that I am doing here or is it an issue with the latest compiler.
Thanks,
Amit |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Feb 22, 2013 11:33 am Post subject: |
|
|
Hi Amit,
The newer PGI CUDA Fortran versions do use newer CUDA (4.2, 5.0) which also need newer CUDA Driver versions install. My best guess is that you simply need to update your driver.
What is the output from the 'pgaccelinfo' utility? It will tell us what driver version you have installed.
- Mat |
|
| Back to top |
|
 |
amitamritkar
Joined: 02 Oct 2009 Posts: 11
|
Posted: Fri Feb 22, 2013 11:42 am Post subject: |
|
|
Hi Matt, the output from pgaccelinfo utility is as follows,
| Quote: |
CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.54 Sat Sep 29 00:05:49 PDT 2012
CUDA Device Number: 0
Device Name: Tesla C2050
Device Revision Number: 2.0
Global Memory Size: 2817982464
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1500 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 5746011 microseconds
Current free memory: 2755256320
Upload time (4MB): 1342 microseconds ( 805 ms pinned)
Download time: 1150 microseconds ( 933 ms pinned)
Upload bandwidth: 3125 MB/sec (5210 MB/sec pinned)
Download bandwidth: 3647 MB/sec (4495 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc20
CUDA Device Number: 1
Device Name: Tesla C2050
Device Revision Number: 2.0
Global Memory Size: 2817982464
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1500 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Initialization time: 5746011 microseconds
Current free memory: 2755256320
Upload time (4MB): 1503 microseconds ( 804 ms pinned)
Download time: 1211 microseconds ( 933 ms pinned)
Upload bandwidth: 2790 MB/sec (5216 MB/sec pinned)
Download bandwidth: 3463 MB/sec (4495 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc20
|
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Feb 22, 2013 11:52 am Post subject: |
|
|
Nope, I'm wrong. That's a current driver.
What happens if you run in debug mode in emulation (-Mcuda=emu)? Do you see similar or the problems?
If not, then I'll need to see a reproducing example to determine what's wrong. If it's too big to post, please send a note to PGI Customer Service (trs@pgroup.com) and ask them to forward the example to me.
Thanks,
Mat |
|
| Back to top |
|
 |
amitamritkar
Joined: 02 Oct 2009 Posts: 11
|
Posted: Fri Feb 22, 2013 12:43 pm Post subject: emulator mode |
|
|
Hi Mat,
I get 100s of such warnings in the emulator mode and it finally fails.
The code does go beyond the point where it failed on the GPU though.
| Quote: | Warning: Number of emulated threads (14) is less than available cpus (24)
Warning: Number of emulated threads (14) is less than available cpus (24)
Error: _mp_task_yield/_mp_task_sync does not work in this case
a region with one thread
a nested task
an immediate task |
Thanks,
Amit |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|