PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

accelerator parallization issues
Goto page Previous  1, 2, 3, 4
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Apr 12, 2010 10:39 am    Post subject: Reply with quote

Hi Jerry,

Quote:
I have my eye on the C2050, which looks like it is just been released. I would like to see some benchmarks that compare the current C1050 cards to the new ones using PGI on FORTRAN codes, both large and small, and single and double precision.


I do have a C2050 but don't have any benchmark comparisons. Maybe someone else?

- Mat
Back to top
View user's profile
Jerry Orosz



Joined: 02 Jan 2008
Posts: 20
Location: San Diego

PostPosted: Mon Apr 12, 2010 2:47 pm    Post subject: Reply with quote

Hi Mat,

mkcolg wrote:


Can you try your code again with PGI 10.4? "-ta=nvidia,fastmath" now uses a less precise but much faster divide. Another code, WRF, sees a 3x speed-up of the accelerator computation. Given the number of divides in your code, it will most likely help your code as well. Check your answers, though.

- Mat


We have 10.4 installed. The code compiles, but I get this when trying to run:

Code:

call to EventCreate returned error 2: Out of memory


We have CUDA 10.3, although it is not clear the PGI compilers are using it. When I run pgaccelinfo, I get this:



Code:

CUDA Driver Version            2030

Device Number:                 0
Device Name:                   Tesla C1060
Device Revision Number:        1.3
Global Memory Size:            4294705152
Number of Multiprocessors:     30
Number of Cores:               240
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           16384
Warp Size:                     32
Maximum Threads per Block:     512
Maximum Block Dimensions:      512, 512, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          262144B
Texture Alignment              256B
Clock Rate:                    1296 MHz
Initialization time:           7911 microseconds
Current free memory            4246142976
Upload time (4MB)               866 microseconds ( 715 ms pinned)
Download time                   954 microseconds ( 734 ms pinned)
Upload bandwidth               4843 MB/sec (5866 MB/sec pinned)
Download bandwidth             4396 MB/sec (5714 MB/sec pinned)


Does that top line indicate a CUDA version of 2.03?

The 10.3 binary is still there, and when I use it, the code runs without an error.

Thanks,

Jerry
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Apr 12, 2010 2:59 pm    Post subject: Reply with quote

Hi Jerry,
Quote:

Does that top line indicate a CUDA version of 2.03?
It means that your driver supports CUDA 2.3. A "3000" means CUDA 3.0.
Quote:

We have CUDA 10.3, although it is not clear the PGI compilers are using it.
I'm assuming you mean CUDA 3.0. By default "-ta=nvidia" will use CUDA 2.3. If you're adding "-ta=nvidia,cuda3.0" then you will need to update your driver.

What flags are you using? If nothing changed except the compiler version, then we've got a bug. If this is the case, can you send a report to PGI Customer Support (trs@pgroup.com) and include the code?

Thanks,
Mat
Back to top
View user's profile
Jerry Orosz



Joined: 02 Jan 2008
Posts: 20
Location: San Diego

PostPosted: Mon Apr 12, 2010 3:54 pm    Post subject: Reply with quote

Hi Mat,

I meant CUDA 2.3 in the previous message.

We have CUDA 3.0 installed, and pgaccelinfo gives the "3000" string at the top. I no longer get the "out of memory error". In fact, at this point, I cannot make that error come back.

Previously, before we changed CUDA, the 10.4 version of pgfortran would produce the error, and the 10.3 version would not. The CUDA version was 2.3.


That said, the -fastmath option did not seem to improve the performance. Also, the card takes 2.5 seconds to initialize, compared to about 0.2 seconds for the previous CUDA.

Jerry
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4
Page 4 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group