PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

WRF compiler optimisation
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Nov 26, 2004 9:42 am    Post subject: Reply with quote

Hi Craig,

The office is closed for a few days due to the Thanksgiving holiday so I don't have access to WRF. We'll be back on Monday so if you don't mind waiting I'll see what I can determine then.

Thanks,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Dec 01, 2004 11:29 am    Post subject: Reply with quote

Hi Craig,


Unfortuntately, I have not been able to recreate the error so don't have a good idea how to fix it. Is it possible for you characterize how is seg faulting?

I'd like you to re-build with "-g -O0 -mp" and re-run. If it still seg faults, run it again in pgdbg or gdb and determine which file and which line it seg faults at. (use the 'where' and 'disasm' commands) If it does not seg fault at -O0, then continue adding higher optimization until it does, i.e. "-O2 -g -mp", "-fast -g -mp", -fastsse -g -mp".

Thanks,
Mat
Back to top
View user's profile
Craig Arthur



Joined: 01 Sep 2004
Posts: 5

PostPosted: Tue Dec 07, 2004 4:29 pm    Post subject: Reply with quote

Hi Mat,

I started out with the basic ‘-g –O0 –mp’ flag set, and compilation failed with a long list of errors. So I stepped back to the default set (as in those in the configure.wrf I posted previously) and progressively worked back to a point which compilation was successful. The most basic set I could get down to was ‘-g –O0 –mp –byteswapio –Mfree’ (I can’t find any mention of ‘-Mfree’ in the PGF User Guide, so I’m unsure of its effect).

I ran the compiled executable in pgdbg, with pgienv omp on, and I can reach the first OMP command, which is in the subroutine SOLVE_EM.

Code:
pgdbg> step
Stopped at 0x490705, function solve_em, file solve_em.f, line 1523
 #1523:        !$OMP PARALLEL DO   &

pgdbg> step
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x490728
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x85f2d0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x8600e0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x85fe58
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x4ca6f0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0xb7a688
Stopped at 0x49072a, function solve_em, file solve_em.f, line 1526
 #1526:        DO ij = 1 , grid%num_tiles

pgdbg> step

 

The relevant code lines are
Code:

!$OMP PARALLEL DO   &
!$OMP PRIVATE ( ij )

   DO ij = 1 , grid%num_tiles

      CALL rk_step_prep  ( config_flags, rk_step,            &
                           u_2, v_2, w_2, t_2, ph_2, mu_2,   &
                           moist_2,                          &
                           ru, rv, rw, ww, php, alt, muu, muv,   &
                           mub, mut, phb, pb, p, al, alb,    &
                           cqu, cqv, cqw,                    &
                           msfu, msfv, msft,                 &
                           fnm, fnp, dnw, rdx, rdy,          &
                           num_3d_m,                         &
                           ids, ide, jds, jde, kds, kde,     &
                           ims, ime, jms, jme, kms, kme,     &
                           grid%i_start(ij), grid%i_end(ij), &
                           grid%j_start(ij), grid%j_end(ij), &
                           k_start, k_end                   )

   END DO
   !$OMP END PARALLEL DO


And on stepping into the DO loop, the debugger dies reporting
Code:
pgserv 27022: read: stranger PID 27023
db_set_code_brk : DiBreakpointSet fails
pgserv 27022: cont : no threads to continue


I decided it worth running an idealised case compiled with the same configure.wrf (em_quarter_ss), as it does run on 2 cpu's. The debugger dies at the same location as in the real case, reporting the same errors. As such, I don't think I'm actually reaching the point where em_real is seg faulting.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Dec 08, 2004 11:48 am    Post subject: Reply with quote

Hi Craig,

Sorry I should have been more clear and said to change just the "FCOPTIM" flag and leave the "FCBASEOPTS" as is. Also, "-Mfree" and "-Mfixed" override the extension (.F, .F90, .f, .f90) to indicate if the file is free or fixed form.

Since 5.1-6 pre-dates Fedora Core 2 and a lot changed with the thread library, the 5.1 version of pgdbg can not step through parallel regions. Again, I should have been more clear. Please run the application without stepping and let it run until it seg faults. Then use the "where" command to see where your at in the program and "diasm" to see what assembly instructions were being executed. Also, please run the exe outside of the debugger to ensure that it does indeed still seg fault at the lower optimization.

Since 5.1-6 does not offically support Fedora Core 2, I'd also like you to try upgrading to 5.2-4 http://www.pgroup.com/support/download_release.php. It is possible that we have an incompatabily between 5.1-6 and Fedora Core 2. Also, the debugger when through a major upgrade. Note that we upgraded your license to 5.2 but you'll need to regenerate your license key in order for the 5.2 compilers to work beyond the 15 day evaluation.

Thanks,
Mat
Back to top
View user's profile
Craig Arthur



Joined: 01 Sep 2004
Posts: 5

PostPosted: Thu Dec 16, 2004 3:51 pm    Post subject: Reply with quote

Hi Mat,

I've gone through the steps you set out above, and the executable continues to seg fault. Below is one example of the output from the debugger when running wrf.exe compiled with "-g -O0 -mp".

Code:

([1] New Thread)
 WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   2                             
 WRF NUMBER OF TILES =   2                                                       
[0] Signalled SIGSEGV at 0x5D88EB, function surface_driver, file module_surface_driver.f, line 374
5D88EB:  F3 F 11 4 8A                   movss  %xmm0,(%rdx,%rcx,4)

pgdbg [all] 0> where
surface_driver line: "module_surface_driver.f"@374 address: 0x5D88EB 
pgdbg [all] 0> disasm
5D88EB:  F3 F 11 4 8A                   movss  %xmm0,(%rdx,%rcx,4)
5D88F0:  FF 85 50 FF FF FF              incl   -176(%rbp)
5D88F6:  FF 8C 24 60 1 0 0              decl   352(%rsp)

pgdbg [all] 0> threads
0   ID   PID     STATE      SIGNAL      LOCATION
 => 0    30926   Signalled  SIGSEGV     surface_driver line: "module_surface_driver.f"@374 address: 0x5D88EB
    1    30927   Stopped    SIGSTOP     __GI_sched_yield file: interp.c address: 0x3EE7DA4129



The catch is though, the seg fault is not consistent in where it occurs. I have found about 5 different points where execution stops, most often in the surface_driver function.

For this reason, I'm starting to suspect the compiler is not the direct source of the issue. I'll play around some more with the debugger to see if I can glean any more information about what's occuring.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group