PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

REAL*16 implementation?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
sjlarrondo



Joined: 17 Sep 2004
Posts: 5

PostPosted: Tue Dec 14, 2004 9:43 am    Post subject: REAL*16 implementation? Reply with quote

Are there any plans or workarounds to providing a REAL*16? The max is REAL*8 and we have some apps we'd like to port over from the Alpha but this seems to be a limitation.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Tue Dec 14, 2004 2:03 pm    Post subject: Reply with quote

Hello,


At this time we don't plan on supporting REAL*16. This is due to the lack of hardware support and the extreme performance penalty of software emulation. Of course, if we see more demand then we'll reconsider.

Thanks,
Mat
Back to top
View user's profile
aragons



Joined: 10 Dec 2004
Posts: 3
Location: San Francisco State University

PostPosted: Tue Dec 21, 2004 3:13 pm    Post subject: Real*16 capability reconsidered? Reply with quote

In 32 bit systems, we have had double precision for a long time. A native word in a 32 bit system is real*4. My basic question is this: why can't the technology that was used to establish real*8 in a 32 bit system WITHOUT significant execution penalty to provide us real*16 in a 64 bit system?

If Cray did it, and Dec did it with the alpha, why can't PGI do it for Opteron? I think there are many of us in the pure number crunching community that would be quite interested in quad precision being done efficiently on a 64 bit system. There must be something I'm missing -please enlighten me.

Thanks.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Dec 22, 2004 10:02 am    Post subject: Reply with quote

In 32-bits, there is double precision hardware support. The x87 chip peforms 80-bit floating point calculations and SSE performs 64-bit. As you suggest, the ideal situation would be for the hardware vendors to also support quad precision so there wouldn't be a severe performance penatly. Alas, this support is unavailable thus requiring software emulation for true REAL*16 support. (Note that some implementations of REAL*16 are really REAL*10 and use the x87 chip).

When we ask customers what they want to focus our efforts on, the overwhelming choice is high performance. Of the few people who ask for REAL*16, most decide that they only want this support if performance can be maintained. Yes Cray and Dec have created higher performance quad precsion packages for their own architectures. However, PGI is independent of any general computing chip manufacturer and seeks to provided equally high performance, no matter who the vendor. We do ship with our product as a matter of convienence AMD's tuned math library ACML, but also work with Intel's MKL.

There are several free libraries available on the web which will to emulate quad precsion using our compilers. From your favorite search engine, a search for "quad precision fortran library" will yield several solutions.

Good Luck,
Mat
Back to top
View user's profile
Johnix



Joined: 30 Jul 2004
Posts: 3

PostPosted: Fri Mar 11, 2005 12:30 am    Post subject: What about the 128-bit media instructions? Reply with quote

Hi,

I was just reading the AMD64 Architecture Programmerís Manual. It sounds like the 128-bit media and scientific instructions have better performance than x87 instructions. And as it suggested replacing x87 code with 128-bit media code is the first choice of improving performance.

Quote:
"Code written with 128-bit media floating-point instructions can operate in parallel on four times as many single-precision floating-point operands as can x87 floating-point code. This achieves potentially four times the computational work of x87 instructions that use single-precision operands. Also, the higher density of 128-bit media floating-point operands may make it possible to remove local temporary variables that would otherwise be needed in x87 floating-point code. 128-bit media code is easier to write than x87 floating-point code, because the XMM register file is flat rather than stack-oriented, and, in 64-bit mode there are twice the number of XMM registers as x87 registers."


I am not sure whether I understand the idea. But if that is true quad-precision is naturally achieved and no penalty at all.

Thanks for comments.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group