| View previous topic :: View next topic |
| Author |
Message |
sjlarrondo
Joined: 17 Sep 2004 Posts: 5
|
Posted: Tue Dec 14, 2004 9:43 am Post subject: REAL*16 implementation? |
|
|
| Are there any plans or workarounds to providing a REAL*16? The max is REAL*8 and we have some apps we'd like to port over from the Alpha but this seems to be a limitation. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Dec 14, 2004 2:03 pm Post subject: |
|
|
Hello,
At this time we don't plan on supporting REAL*16. This is due to the lack of hardware support and the extreme performance penalty of software emulation. Of course, if we see more demand then we'll reconsider.
Thanks,
Mat |
|
| Back to top |
|
 |
aragons
Joined: 10 Dec 2004 Posts: 3 Location: San Francisco State University
|
Posted: Tue Dec 21, 2004 3:13 pm Post subject: Real*16 capability reconsidered? |
|
|
In 32 bit systems, we have had double precision for a long time. A native word in a 32 bit system is real*4. My basic question is this: why can't the technology that was used to establish real*8 in a 32 bit system WITHOUT significant execution penalty to provide us real*16 in a 64 bit system?
If Cray did it, and Dec did it with the alpha, why can't PGI do it for Opteron? I think there are many of us in the pure number crunching community that would be quite interested in quad precision being done efficiently on a 64 bit system. There must be something I'm missing -please enlighten me.
Thanks. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Dec 22, 2004 10:02 am Post subject: |
|
|
In 32-bits, there is double precision hardware support. The x87 chip peforms 80-bit floating point calculations and SSE performs 64-bit. As you suggest, the ideal situation would be for the hardware vendors to also support quad precision so there wouldn't be a severe performance penatly. Alas, this support is unavailable thus requiring software emulation for true REAL*16 support. (Note that some implementations of REAL*16 are really REAL*10 and use the x87 chip).
When we ask customers what they want to focus our efforts on, the overwhelming choice is high performance. Of the few people who ask for REAL*16, most decide that they only want this support if performance can be maintained. Yes Cray and Dec have created higher performance quad precsion packages for their own architectures. However, PGI is independent of any general computing chip manufacturer and seeks to provided equally high performance, no matter who the vendor. We do ship with our product as a matter of convienence AMD's tuned math library ACML, but also work with Intel's MKL.
There are several free libraries available on the web which will to emulate quad precsion using our compilers. From your favorite search engine, a search for "quad precision fortran library" will yield several solutions.
Good Luck,
Mat |
|
| Back to top |
|
 |
Johnix
Joined: 30 Jul 2004 Posts: 3
|
Posted: Fri Mar 11, 2005 12:30 am Post subject: What about the 128-bit media instructions? |
|
|
Hi,
I was just reading the AMD64 Architecture Programmer’s Manual. It sounds like the 128-bit media and scientific instructions have better performance than x87 instructions. And as it suggested replacing x87 code with 128-bit media code is the first choice of improving performance.
| Quote: | | "Code written with 128-bit media floating-point instructions can operate in parallel on four times as many single-precision floating-point operands as can x87 floating-point code. This achieves potentially four times the computational work of x87 instructions that use single-precision operands. Also, the higher density of 128-bit media floating-point operands may make it possible to remove local temporary variables that would otherwise be needed in x87 floating-point code. 128-bit media code is easier to write than x87 floating-point code, because the XMM register file is flat rather than stack-oriented, and, in 64-bit mode there are twice the number of XMM registers as x87 registers." |
I am not sure whether I understand the idea. But if that is true quad-precision is naturally achieved and no penalty at all.
Thanks for comments. |
|
| Back to top |
|
 |
|