PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

pgi/13.1, pgsampt Crash writing to "/tmp/prof
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
escj



Joined: 30 Sep 2009
Posts: 57
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Tue Feb 05, 2013 9:38 am    Post subject: pgi/13.1, pgsampt Crash writing to "/tmp/prof Reply with quote

Hello .

With pgi/13.1 , pgcollect/pgsampt crash trying to write to the directory "/tmp/prof.log" ?

Demo with the samples coming with PGI pack .


Code:
pgfortran --version

pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2013, STMicroelectronics, Inc.  All Rights Reserved.

pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel

make f3.exe
pgfortran -o f3.exe f3.f90 -ta=nvidia -Minfo=accel -fast
smooth:
     24, Generating copyin(b(:,:))
         Generating copy(a(:,:))
     26, Generating present_or_copy(a(:,:))
...

pgsampt f3.exe
            0  errors found
        66818  microseconds on GPU
           72  microseconds on host
target process has terminated, writing profile data
Erreur de segmentation



strace show the opening of the strange log file "/tmp/prof.log"

Code:
strace pgsampt f3.exe
...
open("/tmp/prof.log", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
open("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq", O_RDONLY) = 16
fstat(16, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b24c5fde000
read(16, "2661000\n", 4096)             = 8
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Erreur de segmentation


For testing I've created this directory in my PC ,and after all run OK :

Code:

ls -l /tmp/
total 0

pgsampt f3.exe
            0  errors found
        69881  microseconds on GPU
           71  microseconds on host
target process has terminated, writing profile data

ls -lrt pgprof.out
-rw-r--r-- 1 escj users 1194 févr.  5 17:30 pgprof.out


Next bug coming with pgcollect not showing anymore acc kernel timing ?

A+

Juan
Back to top
View user's profile
donb



Joined: 20 Jul 2004
Posts: 88
Location: The Portland Group, Inc.

PostPosted: Tue Feb 05, 2013 2:05 pm    Post subject: Reply with quote

Well that's embarrassing. My apologies, and thank you for reporting this. The crash will be fixed in release 13.2, expected out this week.

As far as not showing accelerator timings any more, we don't see that in our testing. Note that if the performance data values in the Accelerator Performance tab are zero, then no data is displayed. If you are seeing another problem, please let us know.
Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 57
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Wed Feb 06, 2013 3:24 am    Post subject: Reply with quote

For the "no more acc accelerator timing" .

Alway with the same sample "f3.exe" .

Following the "pgprof13ug.pdf" page 22-25 .

With pgi/12.10 ALL OK
Code:
pgfortran --version
pgfortran 12.10-0 64-bit target on x86-64 Linux -tp nehalem


=> compilation/exec of the sample f3.exe with options "ccff & -g ,etc"

Code:
 pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel

pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...
pgcollect -time -cudainit f3.exe 5000
            0  errors found
       450158  microseconds on GPU
       305996  microseconds on host
target process has terminated, writing profile data

pgprof -exe f3.exe


The PGPROF window show very similar view as Fig 2.12 p.24 of pgprof13ug.pdf
=> 4 columns
Code:
less pg.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013

| Function                | Seconds         | Accelerator Region Time | Accelerator Kernel Time |

| __select_nocancel       |  1,3908 =  46%  |       0 =   0%          |       0 =   0%          |
| main                    |    8046 =  27%  |       0 =   0%          |       0 =   0%          |
| smoothhost              |    3448 =  11%  |       0 =   0%          |       0 =   0%          |
| __GI_sched_yield        |    3448 =  11%  |       0 =   0%          |       0 =   0%          |
| sstk                    |     460 =   2%  |       0 =   0%          |       0 =   0%          |
| __c_mcopy4              |     460 =   2%  |       0 =   0%          |       0 =   0%          |
| __lll_lock_wait_private |     115 =   0%  |       0 =   0%          |       0 =   0%          |
| do_lookup_x             |     115 =   0%  |       0 =   0%          |       0 =   0%          |
| smooth                  |       0 =   0%  |    7663 = 100%          |    3158 = 100%          |


The smooth replace the mm1 function of the user guide doc .

And diving in smooth show where in the subroutine the time is spend on region &k ernel accelarated by directives
Code:
 less smooth.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013

| Line | Source                                                                  | Seconds         | Accelerator Region Time | Accelerator Kernel Time |

|      |  subroutine smooth( a, b, w0, w1, w2, n, m, niters )                    |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   real, dimension(:,:) :: a,b                                           |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   real :: w0, w1, w2                                                    |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   integer :: n, m, niters                                               |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   integer :: i, j, iter                                                 |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |   !$acc data region copy(a(:,:)) copyin(b(:,:))                         |       0 =   0%  |    4501 =  59%          |       0 =   0%          |
|      |    do iter = 1,niters                                                   |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |    !$acc region                                                         |       0 =   0%  |    3161 =  41%          |       0 =   0%          |
|      |     do i = 2,n-1                                                        |       0 =   0%  |       0 =   0%          |       0 =   0%          |
|      |      do j = 2,m-1                                                       |       0 =   0%  |       0 =   0%          |    2077 =  66%          |
|      |       a(i,j) = w0 * b(i,j) + &                                          |       0 =   0%  |       0 =   0%          |       0 =   0%          |



With pgi/13.1, PB NO MORE DATA/KERNEL COLUMN
Code:

pgfortran --version
pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem
pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...

 pgcollect -time -cudainit f3.exe 5000
            0  errors found
       451390  microseconds on GPU
       282197  microseconds on host
target process has terminated, writing profile data

pgprof -exe f3.exe
 


The sample spend 45139 ms on GPU but the PGPROG window show now :

Code:
less pg131.txt
Profiled: ./f3.exe on Wed Feb 06 11:02:06 CET 2013

| Function                | Seconds         |

| __select_nocancel       |  1,3678 =  48%  |
| main                    |    8046 =  28%  |
| __GI_sched_yield        |    3678 =  13%  |
| smoothhost              |    2989 =  10%  |
| sstk                    |     230 =   1%  |
| __lll_lock_wait_private |     115 =   0%  |


=> No more smooth routine accelerate by acc directives , only the host one is shown ...
=> No more region/kernel timing

Rem :
activing the "pgcollect -cuda" option give some info on the gpu kernel generated by the compiler ...
but the profile obtained by this way is completely flatten and relation with the smooth source code is completely lost !

Code:
less pg131_cuda.txt
Profiled: ./f3.exe on Wed Feb 06 11:11:31 CET 2013

| Function                | Seconds         | CUDA GPU Secs   | CUDA CPU Secs   |

| __select_nocancel       |  1,3596 =  48%  |       0 =   0%  |       0 =   0%  |
| main                    |    7865 =  28%  |       0 =   0%  |       0 =   0%  |
| __GI_sched_yield        |    3596 =  13%  |       0 =   0%  |       0 =   0%  |
| smoothhost              |    3146 =  11%  |       0 =   0%  |       0 =   0%  |
| sstk                    |     225 =   1%  |       0 =   0%  |       0 =   0%  |
| __lll_lock_wait_private |     112 =   0%  |       0 =   0%  |       0 =   0%  |
| smooth_28_gpu           |       0 =   0%  |    2128 =  57%  |       1 =   0%  |
| smooth_35_gpu           |       0 =   0%  |    1051 =  28%  |       0 =   0%  |
| memcpyDtoHasync         |       0 =   0%  |     194 =   5%  |     197 =  34%  |
| memcpyHtoDasync         |       0 =   0%  |     374 =  10%  |     374 =  65%  |


A+

Juan
Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 57
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Fri Feb 08, 2013 1:31 am    Post subject: Reply with quote

Hello Don ...
No news ...

Could you check/reproduce the missigne time kernel/region problem on your side ?

A+

Juan
Back to top
View user's profile
donb



Joined: 20 Jul 2004
Posts: 88
Location: The Portland Group, Inc.

PostPosted: Fri Feb 08, 2013 12:30 pm    Post subject: Reply with quote

Juan,

Here is the story with accelerator profiling in 13.1 (and 13.2):

The accelerator runtime has been completely reorganized in 13.x. As part of that work the portion of the runtime that generates the accelerator profiler data has been reworked to allow users or other tool developers to add their own data collection facilities.

Unfortunately that work has not been finished and will not appear in a PGI release until 13.3 at the soonest. There is no workaround when using 13.1 or 13.2.

Obviously we had some communication and testing issues on our end or we would have been able to inform you better/sooner about this. I will work on addressing these issues right away.

--Don
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group