View previous topic :: View next topic 
Author 
Message 
tlstar
Joined: 31 Mar 2011 Posts: 22

Posted: Sun Apr 10, 2011 2:51 am Post subject: 


Done!
tlstar wrote:  I think that the most possible reason for different results from emulation (run in series of threads) and CUDA release may due to the conflicting on the write device memory.

The rounding error may not be treated as the same in GPU & CPU.
This error is then propagating in the Random Number Generation algorithm.
So even for Fermi, TESLA C2050 is not full IEEE754 standard?
Ref. from CUDA wiki:
For double precision (for GPUs supporting CUDA compute capability 1.3 and above[12]) there are some deviations from the IEEE 754 standard: roundtonearesteven is the only supported rounding mode for reciprocal, division, and square root. In single precision, denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and roundtonearest even), and those are specified on a perinstruction basis rather than in a control word; and the precision of division/square root is slightly lower than single precision.
The RNG Algorithm used here is very classical, something like the following.
Algorithm 1: Combined Multiple Recursive Generator
For i = 3 to n
Xi = (1403580Xi2  810728Xi3)(mod 4294967087)
Yi = (527612Yi1  1370589Xi3)(mod 4294944443)
Zi = (Xi  Yi)(mod 4294967087)
If (Zi > 0)
Ui = Zi/4294967088
If (Zi = 0)
Ui = 4294967087/4294967088
i = i + 1
End i
Last edited by tlstar on Mon Apr 11, 2011 7:28 am; edited 1 time in total 

Back to top 


tlstar
Joined: 31 Mar 2011 Posts: 22

Posted: Mon Apr 11, 2011 2:40 am Post subject: 


Hi Mat,
I have made two updates in my code:
1. compile cell_loo_GPU.F90 in "o0" model
To avoid vector optimization in "dolooping" of calling kernel.
settled the 4*14*BLOCK_SIZE problem
2. Round off random seeds in aleatoire_init_GPU
This makes the exact same random number gotten by emu & GPU
!===============================================================================
SUBROUTINE aleatoire_init_GPU(nblock)
...............
! normalize the seeds
random(1:3,:) = AINT(random(1:3,:)*m1,8)
random(4:6,:) = AINT(random(4:6,:)*m2,8)
! write(0,*) "random seeds inited"
! write(0,*) random(:,1:10)
...............
END SUBROUTINE aleatoire_init_GPU
!===============================================================================
But the results are still not the same between GPU & emu.
As we do not have a debug tool for GPU fortran , it's really awful to investigate.
I hope the GPU fortran debug could be available soon. Otherwise the costs on debugging codes would be more than the translate the codes into C.
Could you tell me how to setup the "be" tool (.gpu to .ptx) to "o0"?
gfwang 

Back to top 


tlstar
Joined: 31 Mar 2011 Posts: 22

Posted: Mon Apr 11, 2011 4:34 am Post subject: 


Bug report:
In kernel Fortran source code file:
Code:  cos_theta = 1.0d0  2.d0 * nb_aleatoire(randseed) 
Compiled by pgfortran into low level c:
Code:  cos_theta = (1.00000000000000000E+0)((nb_aleatoire((signed char*)(_prandseed)))+(nb_aleatoire((signed char*)(_prandseed)))); 
.....
Notice nb_aleatoire is a function, the value is depending on the INTENT(INOUT) argument randseed. The translation is not equal.
Furthermore, I do not understand why the compiler would like to optimize this. As we all known, multiplication operation (*) is no slower then the addition in GPU or modern CPU, and much faster than getting a function value.
Quote:  NOTE: your trial license will expire in 3 days, 12.7 hours. 
I think I should be awarded with a longer term trail license of pgfortran compiler for my bug digging work in compiler itself. 

Back to top 


tlstar
Joined: 31 Mar 2011 Posts: 22

Posted: Mon Apr 11, 2011 7:26 am Post subject: 


But report (or future request) 2:
All initial GPU constant variables are set to 0 or 0.000 after pgfortran compiling into lowlevel C code (.gpu), without respection to userdefined values.
Code:  DOUBLE PRECISION, constant :: epsilon_paroi = 0.5 
into
Code:  __constant__ struct{
int* m0;long long m8; ............................ ;double m2600;double m2608;
}__align__(16) _raycast_gpukernel_17 = { 0,0,......,0.000000 }; 


Back to top 


mkcolg
Joined: 30 Jun 2004 Posts: 6738 Location: The Portland Group Inc.

Posted: Mon Jul 18, 2011 3:34 pm Post subject: 


Hi gfwang,
FYI, support for floating point atomics (TPR#17778) was added a few releases ago (sorry for the late update). The only caveat is that you need a device that supports CC2.0 to use them.
 Mat 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

Powered by phpBB © phpBB Group
