PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Monte Carlo Example on Fermis Not Working?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 317
Location: Greenbelt, MD

PostPosted: Wed Mar 23, 2011 7:53 am    Post subject: Monte Carlo Example on Fermis Not Working? Reply with quote

I'm hoping someone can help me with an oddity I'm seeing with the Monte Carlo example. I've gotten access to a Fermi system so I'm learning how to use them carefully and methodically.

If I run the CUF1 example on a Tesla T10 system:
Code:
> make DFLAG=-DUSE_SMALL run_CUF1
pgfortran -fast -c -Iinc ./src/mcUtils.F90 -o ./obj/mcUtils.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess -DUSE_SMALL -DITER=10 ./src/mcCUF_1.F90 -o ./obj/mcCUF_1.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess -DUSE_SMALL -DITER=10 -DMCTYPE=11 ./src/monte_drv.F90 -o ./obj/monte_drv_cuf1.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf1.o ./obj/mcUtils.o ./obj/mcCUF_1.o  -o mcCUF_1.out
time  mcCUF_1.out
 ----- CUF1 -----
 Result =     3.142020   
 Standard deviation =    1.0021195E-04
 Difference from real PI value =    4.2748451E-04
 Time in Seconds
    Total :    5.07815
      RNG :    3.07867
  Compute :    0.09659
Data Xfer :    0.59191
3.49user 1.43system 0:05.16elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+197764minor)pagefaults 0swaps
That looks good. Now we run on a Fermi system:
Code:
> make DFLAG=-DUSE_SMALL run_CUF1
pgfortran -fast -c -Iinc ./src/mcUtils.F90 -o ./obj/mcUtils.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess -DUSE_SMALL -DITER=10 ./src/mcCUF_1.F90 -o ./obj/mcCUF_1.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess -DUSE_SMALL -DITER=10 -DMCTYPE=11 ./src/monte_drv.F90 -o ./obj/monte_drv_cuf1.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf1.o ./obj/mcUtils.o ./obj/mcCUF_1.o  -o mcCUF_1.out
time  mcCUF_1.out
 ----- CUF1 -----
 Result =   -1.2149596E+14
 Standard deviation =              Inf
 Difference from real PI value =    1.2149596E+14
 Time in Seconds
    Total :    9.68955
      RNG :    3.14654
  Compute :    0.08587
Data Xfer :    0.69943
3.59user 6.02system 0:09.92elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+197882minor)pagefaults 0swaps

Any ideas why the result is so bad? And, I suppose, why the Total time has increased so much?

It is the same compiler (11.1) and same example, so the only difference is Tesla to Fermi. If it helps, examples CUF4 and CUF5 do seem to work.

Thanks,
Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Wed Mar 23, 2011 8:23 am    Post subject: Reply with quote

Hi Matt,

I'm assuming that this is a bug in my original code where I wasn't setting all values of dtemp. Though, I found this error in July 2010 and updated the source package on our website soon after. Can you check if you have my latest code?

Thanks,
Mat
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 317
Location: Greenbelt, MD

PostPosted: Wed Mar 23, 2011 8:31 am    Post subject: Reply with quote

mkcolg wrote:
Hi Matt,

I'm assuming that this is a bug in my original code where I wasn't setting all values of dtemp. Though, I found this error in July 2010 and updated the source package on our website soon after. Can you check if you have my latest code?

I grabbed this tarball:http://www.pgroup.com/lit/samples/pginsider/pgi_mc_example.tar.gz

When I wget it, it has a date stamp of 2010-02-24 and most of the files within are around that date as well.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Wed Mar 23, 2011 9:21 am    Post subject: Reply with quote

We'll shoot. I guess I never verified that the source did really get updated. I'll work on getting this fixed.

The quick fix is to use sizes that are divisible by 256. So change monte_drv.F90 to use new values for N:

Code:
#if defined(USE_SMALL)
!  PARAMETER(N=16777215_4, PI=3.1415926535_4)
   PARAMETER(N=16776960_4, PI=3.1415926535_4)
#else
!  PARAMETER(N=67108860_4, PI=3.1415926535_4)
   PARAMETER(N=67108608_4, PI=3.1415926535_4)
#endif


A better fix would be to launch more threads than needed and then check that the 'i' index in the kernel is not greater than N. i.e. change the "dimGrid = dim3(N/dimBlock%x,1,1)" to "dimGrid = dim3((N+dimBlock%x-1)/dimBlock%x,1,1)" and then put an if statement in the kernel to make sure i is less then N.

Though, the first few examples are intentionally poor implementations so I'll just update N in the driver code.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group