|
| View previous topic :: View next topic |
| Author |
Message |
skimmed
Joined: 19 Oct 2009 Posts: 4
|
Posted: Wed Oct 21, 2009 5:09 am Post subject: Cactus BenchADM crashes |
|
|
Hi
I downloaded Cactus BenchADM benchmark and followed its tutorial.txt (as well as the article "Building Cactus BenchADM with PGI accelerator compilers" by Mathew Colgrove) to build and run the code. The cpu version compiles and runs correctly. The CUDA version (StaggeredLeapfrog2_acc1.F, came with the package) crashed during the run, although it complied correctly. I then tried other steps:acc2, acc3, they all gave the same behaviour.
I noticed that in the compiler message it shows
" 367, !$acc do parallel, vector(2)
371, !$acc do parallel, vector(3)" while the tutorial documents showed "vector(8)" for the same bits. I don't know why they are different.
pgaccelinfo runs fine and the code compiles, so I guess I installed both CUDA and the compiler correctly.
I would appreciate any suggestions on what I need to do to make the run.
My system is RedHat 5.1, kernel 2.6.18-128.el5 x86_64 SMP
PGI 9.0.4
tesla c1060
CUDA 2.3
The error messages are:
[tester@bra-tesladev1 PGI_Acc_benchADM]$ make SIZE=120 OPT="-fast -ta=nvidia,time -Minfo=accel" build_acc1 run_acc1
pgfortran -fast -ta=nvidia,time -Minfo=accel -c -o objdir/StaggeredLeapfrog2_acc1.o ./src/StaggeredLeapfrog2_acc1.F
NOTE: your trial license will expire in 12 days, 11.2 hours.
NOTE: your trial license will expire in 12 days, 11.2 hours.
bench_staggeredleapfrog2:
366, Generating copyout(adm_kzz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyout(adm_kyz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(lalp(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyout(adm_kyy_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyout(adm_kxz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyout(adm_kxy_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyout(adm_kxx_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(lgzz(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(lgyz(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(lgyy(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(lgxz(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(lgxy(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(lgxx(1:nx-2+2,1:ny-2+2,1:nz-2+2))
Generating copyin(adm_kzz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kzz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kyz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kyz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kyy_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kyy_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxy_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxy_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxx_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
Generating copyin(adm_kxx_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))
367, Loop is parallelizable
371, Loop is parallelizable
375, Loop is parallelizable
Accelerator kernel generated
367, !$acc do parallel, vector(2)
371, !$acc do parallel, vector(3)
375, !$acc do vector(16)
Using register for 'adm_kxx_stag_p'
Using register for 'adm_kxy_stag_p'
Using register for 'adm_kxz_stag_p'
Using register for 'adm_kyy_stag_p'
Using register for 'adm_kyz_stag_p'
Using register for 'adm_kzz_stag_p'
Non-stride-1 accesses for array 'lgxx'
Non-stride-1 accesses for array 'lgxy'
Cached references to size [18x5x4] block of 'lgxz'
Cached references to size [18x5x4] block of 'lgyy'
Cached references to size [18x5x4] block of 'lgyz'
Cached references to size [18x5x4] block of 'lgzz'
Cached references to size [18x5x4] block of 'lalp'
pgfortran objdir/PreLoop.o objdir/StaggeredLeapfrog1a.o objdir/StaggeredLeapfrog1a_TS.o objdir/planewaves.o objdir/teukwaves.o /cctk_ThornBindings.o objdir/StaggeredLeapfrog2_acc1.o objdir/Cactus.......
............
/InitialiseCactus_acc.o -fast -ta=nvidia,time -Minfo=accel -Mnomain -o bin/benchADM_acc1
time bin/benchADM_acc1 BenchADM_40l_120.par
--------------------------------------------------------------------------------
10
1 0101 ************************
01 1010 10 The Cactus Code V4.0
1010 1101 011 www.cactuscode.org
1001 100101 ************************
00010101
100011 (c) Copyright The Authors
0100 GNU Licensed. No Warranty
0101
--------------------------------------------------------------------------------
Cactus version: 4.0.b11
Parameter file: BenchADM_40l_120.par
--------------------------------------------------------------------------------
Activating thorn Cactus...Success -> active implementation Cactus
Activation requested for
--->einstein time benchadm pugh pughreduce cartgrid3d ioutil iobasic<---
Activating thorn benchadm...Success -> active implementation benchadm
Activating thorn cartgrid3d...Success -> active implementation grid
Activating thorn einstein...Success -> active implementation einstein
Activating thorn iobasic...Success -> active implementation IOBasic
Activating thorn ioutil...Success -> active implementation IO
Activating thorn pugh...Success -> active implementation driver
Activating thorn pughreduce...Success -> active implementation reduce
Activating thorn time...Success -> active implementation time
--------------------------------------------------------------------------------
if (recover)
Recover parameters
endif
Startup routines
BenchADM: Register slicings
CartGrid3D: Register GH Extension for GridSymmetry
CartGrid3D: Register coordinates for the Cartesian grid
PUGH: Startup routine
IOUtil: Startup routine
IOBasic: Startup routine
PUGHReduce: Startup routine.
Parameter checking routines
BenchADM: Check parameters
CartGrid3D: Check coordinates for CartGrid3D
Initialisation
CartGrid3D: Set up spatial 3D Cartesian coordinates on the GH
Einstein: Set up GF symmetries
Einstein: Initialize slicing, setup priorities for mixed slicings
PUGH: Report on PUGH set up
Time: Initialise Time variables
Time: Set timestep based on Courant condition
Einstein: Initialisation for Einstein methods
Einstein: Flat initial data
BenchADM: Setup for ADM
Einstein: Set initial lapse to one
BenchADM: Time symmetric initial data for staggered leapfrog
if (recover)
endif
if (checkpoint initial data)
endif
if (analysis)
Einstein: Compute the trace of the extrinsic curvature
Einstein: Calculate the spherical metric in r,theta(q), phi(p)
Einstein: Calculate the spherical ex. curvature in r, theta(q), phi(p)
endif
do loop over timesteps
Rotate timelevels
iteration = iteration + 1
t = t+dt
Einstein: Identify the slicing for the next iteration
BenchADM: Evolve using Staggered Leapfrog
if (checkpoint)
endif
if (analysis)
Einstein: Compute the trace of the extrinsic curvature
Einstein: Calculate the spherical metric in r,theta(q), phi(p)
Einstein: Calculate the spherical ex. curvature in r, theta(q), phi(p)
endif
enddo
Termination routines
PUGH: Termination routine
Shutdown routines
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Driver provided by PUGH
--------------------------------------------------------------------------------
INFO (IOBasic): I/O Method 'Scalar' registered
INFO (IOBasic): Scalar: Output of scalar quantities (grid scalars, reductions) to ASCII files
INFO (IOBasic): I/O Method 'Info' registered
INFO (IOBasic): Info: Output of scalar quantities (grid scalars, reductions) to screen
INFO (BenchADM): Evolve using the ADM system
INFO (BenchADM): with staggered leapfrog
INFO (CartGrid3D): Grid Spacings:
INFO (CartGrid3D): dx=>8.4033613e-03 dy=>8.4033613e-03 dz=>8.4033613e-03
INFO (CartGrid3D): Computational Coordinates:
INFO (CartGrid3D): x=>[-0.500, 0.500] y=>[-0.500, 0.500] z=>[-0.500, 0.500]
INFO (CartGrid3D): Indices of Physical Coordinates:
INFO (CartGrid3D): x=>[0,119] y=>[0,119] z=>[0,119]
INFO (PUGH): Single processor evolution
INFO (PUGH): 3-dimensional grid functions
INFO (PUGH): Size: 120 120 120
INFO (Einstein): Setting flat Minkowski space in Einstein
INFO (IOBasic): Info: Output every 10 iterations
INFO (IOBasic): Info: Output requested for EINSTEIN::gxx EINSTEIN::alp
------------------------------------------------------------------------------
it | | EINSTEIN::gxx | EINSTEIN::alp |
| t | minimum | maximum | minimum | maximum |
------------------------------------------------------------------------------
0 | 0.000 | 1.00000000 | 1.00000000 | 1.00000000 | 1.00000000 |
call to ctxSynchronize returned error 700: Launch failed
Accelerator Kernel Timing data
./src/StaggeredLeapfrog2_acc1.F
bench_staggeredleapfrog2
366: region entered 1 time
time(us): init=1
375: kernel launched 1 times
grid: [59x40] block: [16x3x2]
time(us): total=0 max=0 min=0 avg=0
acc_init.c
acc_init
1: region entered 1 time
time(us): init=51061
Command exited with non-zero status 1
1.12user 0.66system 0:01.79elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+183167minor)pagefaults 0swaps
make: *** [run_acc1] Error 1 |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Oct 21, 2009 8:47 am Post subject: |
|
|
Hi Skimmed,
A "ctxSynchronize returned error 700" error typically means that when copying over the data to the device, there was an access violation. Exactly why this is occurring, I'm not sure. Your -Minfo output looks correct (the vector message is just a difference between 9.0-4 and 9.0-3 which is what I used to write the tutorial).
The first thing I'd try is to reboot your system. I've seen a few times where the device driver gets messed up and starts giving odd errors like this.
Next, set "NVDEBUG=1" in your environment. This will give you a lot of debug information but show exactly which variable is causing the crash.
Also, try one of the smaller examples found in "$PGI/linux86-64/9.0-4/etc/samples". If these fail as well, then I'm leaning towards a system issue rather than compiler.
- Mat |
|
| Back to top |
|
 |
skimmed
Joined: 19 Oct 2009 Posts: 4
|
Posted: Wed Oct 21, 2009 9:58 am Post subject: |
|
|
Thanks, Mat.
A reboot eventually sorted things out and now the code runs. However I noticed that compared with your results, my data value (27132909 vs. 7112575) is almost four times as big. Is there a way to improve on this by tuning compiler options or is it limited by hardware?
Accelerator Kernel Timing data
./src/StaggeredLeapfrog2_acc3.F
bench_staggeredleapfrog2
369: region entered 100 times
time(us): total=35310202 init=99 region=35310103
kernels=8177194 data=27132909
w/o init: total=35310103 max=382721 min=351194 avg=353101
410: kernel launched 100 times
grid: [118x15] block: [8x32]
time(us): total=8177194 max=82600 min=81376 avg=81771
acc_init.c
acc_init
1: region entered 1 time
time(us): init=51528
54.93user 8.19system 1:03.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (17major+2205915minor)pagefaults 0swaps |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Oct 21, 2009 11:00 am Post subject: |
|
|
Check which PCI slot your card is plugged into. I had a similar issue when I had a card in slot with a x4 link speed instead of the x16 link. You might need to check your motherboard documentation to determine which PCIe slot is which. Most likely the PCIe slots closest the CPU are the x16 link.
- Mat |
|
| Back to top |
|
 |
skimmed
Joined: 19 Oct 2009 Posts: 4
|
Posted: Thu Oct 22, 2009 4:47 am Post subject: |
|
|
Mat
Thanks very much for your help.
The machine (dell precision 690) has two PCI-E 16x slots which are occupied by a Tesla c1060 and a quadro fx1400. No matter which slots the Tesla was in, I got exactly the same results.
CUDA bandwidth test showed it had 1300MB/s uploading and 988MB/s downloading, which are very slow.
It appears to be a configuration issue but at the moment I have no clue on how to solve it. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|