|
| View previous topic :: View next topic |
| Author |
Message |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Apr 12, 2010 10:39 am Post subject: |
|
|
Hi Jerry,
| Quote: | | I have my eye on the C2050, which looks like it is just been released. I would like to see some benchmarks that compare the current C1050 cards to the new ones using PGI on FORTRAN codes, both large and small, and single and double precision. |
I do have a C2050 but don't have any benchmark comparisons. Maybe someone else?
- Mat |
|
| Back to top |
|
 |
Jerry Orosz
Joined: 02 Jan 2008 Posts: 12 Location: San Diego
|
Posted: Mon Apr 12, 2010 2:47 pm Post subject: |
|
|
Hi Mat,
| mkcolg wrote: |
Can you try your code again with PGI 10.4? "-ta=nvidia,fastmath" now uses a less precise but much faster divide. Another code, WRF, sees a 3x speed-up of the accelerator computation. Given the number of divides in your code, it will most likely help your code as well. Check your answers, though.
- Mat |
We have 10.4 installed. The code compiles, but I get this when trying to run:
| Code: |
call to EventCreate returned error 2: Out of memory
|
We have CUDA 10.3, although it is not clear the PGI compilers are using it. When I run pgaccelinfo, I get this:
| Code: |
CUDA Driver Version 2030
Device Number: 0
Device Name: Tesla C1060
Device Revision Number: 1.3
Global Memory Size: 4294705152
Number of Multiprocessors: 30
Number of Cores: 240
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 262144B
Texture Alignment 256B
Clock Rate: 1296 MHz
Initialization time: 7911 microseconds
Current free memory 4246142976
Upload time (4MB) 866 microseconds ( 715 ms pinned)
Download time 954 microseconds ( 734 ms pinned)
Upload bandwidth 4843 MB/sec (5866 MB/sec pinned)
Download bandwidth 4396 MB/sec (5714 MB/sec pinned)
|
Does that top line indicate a CUDA version of 2.03?
The 10.3 binary is still there, and when I use it, the code runs without an error.
Thanks,
Jerry |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Apr 12, 2010 2:59 pm Post subject: |
|
|
Hi Jerry,
| Quote: |
Does that top line indicate a CUDA version of 2.03? | It means that your driver supports CUDA 2.3. A "3000" means CUDA 3.0.
| Quote: |
We have CUDA 10.3, although it is not clear the PGI compilers are using it. | I'm assuming you mean CUDA 3.0. By default "-ta=nvidia" will use CUDA 2.3. If you're adding "-ta=nvidia,cuda3.0" then you will need to update your driver.
What flags are you using? If nothing changed except the compiler version, then we've got a bug. If this is the case, can you send a report to PGI Customer Support (trs@pgroup.com) and include the code?
Thanks,
Mat |
|
| Back to top |
|
 |
Jerry Orosz
Joined: 02 Jan 2008 Posts: 12 Location: San Diego
|
Posted: Mon Apr 12, 2010 3:54 pm Post subject: |
|
|
Hi Mat,
I meant CUDA 2.3 in the previous message.
We have CUDA 3.0 installed, and pgaccelinfo gives the "3000" string at the top. I no longer get the "out of memory error". In fact, at this point, I cannot make that error come back.
Previously, before we changed CUDA, the 10.4 version of pgfortran would produce the error, and the 10.3 version would not. The CUDA version was 2.3.
That said, the -fastmath option did not seem to improve the performance. Also, the card takes 2.5 seconds to initialize, compared to about 0.2 seconds for the previous CUDA.
Jerry |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|