|
| View previous topic :: View next topic |
| Author |
Message |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Tue Feb 05, 2013 9:38 am Post subject: pgi/13.1, pgsampt Crash writing to "/home/donb/tmp/prof |
|
|
Hello .
With pgi/13.1 , pgcollect/pgsampt crash trying to write to the directory "/home/donb/tmp/prof.log" ?
Demo with the samples coming with PGI pack .
| Code: | pgfortran --version
pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2013, STMicroelectronics, Inc. All Rights Reserved.
pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel
make f3.exe
pgfortran -o f3.exe f3.f90 -ta=nvidia -Minfo=accel -fast
smooth:
24, Generating copyin(b(:,:))
Generating copy(a(:,:))
26, Generating present_or_copy(a(:,:))
...
pgsampt f3.exe
0 errors found
66818 microseconds on GPU
72 microseconds on host
target process has terminated, writing profile data
Erreur de segmentation
|
strace show the opening of the strange log file "/home/donb/tmp/prof.log"
| Code: | strace pgsampt f3.exe
...
open("/home/donb/tmp/prof.log", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
open("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq", O_RDONLY) = 16
fstat(16, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b24c5fde000
read(16, "2661000\n", 4096) = 8
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Erreur de segmentation |
For testing I've created this directory in my PC ,and after all run OK :
| Code: |
ls -l /home/donb/tmp/
total 0
pgsampt f3.exe
0 errors found
69881 microseconds on GPU
71 microseconds on host
target process has terminated, writing profile data
ls -lrt pgprof.out
-rw-r--r-- 1 escj users 1194 févr. 5 17:30 pgprof.out
|
Next bug coming with pgcollect not showing anymore acc kernel timing ?
A+
Juan |
|
| Back to top |
|
 |
donb
Joined: 20 Jul 2004 Posts: 82 Location: The Portland Group, Inc.
|
Posted: Tue Feb 05, 2013 2:05 pm Post subject: |
|
|
Well that's embarrassing. My apologies, and thank you for reporting this. The crash will be fixed in release 13.2, expected out this week.
As far as not showing accelerator timings any more, we don't see that in our testing. Note that if the performance data values in the Accelerator Performance tab are zero, then no data is displayed. If you are seeing another problem, please let us know. |
|
| Back to top |
|
 |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Wed Feb 06, 2013 3:24 am Post subject: |
|
|
For the "no more acc accelerator timing" .
Alway with the same sample "f3.exe" .
Following the "pgprof13ug.pdf" page 22-25 .
With pgi/12.10 ALL OK
| Code: | pgfortran --version
pgfortran 12.10-0 64-bit target on x86-64 Linux -tp nehalem |
=> compilation/exec of the sample f3.exe with options "ccff & -g ,etc"
| Code: | pwd
/home/escj/dir_PGF/PGI_HOME/linux86-64/13.1/etc/samples/accel
pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...
pgcollect -time -cudainit f3.exe 5000
0 errors found
450158 microseconds on GPU
305996 microseconds on host
target process has terminated, writing profile data
pgprof -exe f3.exe
|
The PGPROF window show very similar view as Fig 2.12 p.24 of pgprof13ug.pdf
=> 4 columns
| Code: | less pg.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013
| Function | Seconds | Accelerator Region Time | Accelerator Kernel Time |
| __select_nocancel | 1,3908 = 46% | 0 = 0% | 0 = 0% |
| main | 8046 = 27% | 0 = 0% | 0 = 0% |
| smoothhost | 3448 = 11% | 0 = 0% | 0 = 0% |
| __GI_sched_yield | 3448 = 11% | 0 = 0% | 0 = 0% |
| sstk | 460 = 2% | 0 = 0% | 0 = 0% |
| __c_mcopy4 | 460 = 2% | 0 = 0% | 0 = 0% |
| __lll_lock_wait_private | 115 = 0% | 0 = 0% | 0 = 0% |
| do_lookup_x | 115 = 0% | 0 = 0% | 0 = 0% |
| smooth | 0 = 0% | 7663 = 100% | 3158 = 100% | |
The smooth replace the mm1 function of the user guide doc .
And diving in smooth show where in the subroutine the time is spend on region &k ernel accelarated by directives
| Code: | less smooth.txt
Profiled: ./f3.exe on Wed Feb 06 10:39:31 CET 2013
| Line | Source | Seconds | Accelerator Region Time | Accelerator Kernel Time |
| | subroutine smooth( a, b, w0, w1, w2, n, m, niters ) | 0 = 0% | 0 = 0% | 0 = 0% |
| | real, dimension(:,:) :: a,b | 0 = 0% | 0 = 0% | 0 = 0% |
| | real :: w0, w1, w2 | 0 = 0% | 0 = 0% | 0 = 0% |
| | integer :: n, m, niters | 0 = 0% | 0 = 0% | 0 = 0% |
| | integer :: i, j, iter | 0 = 0% | 0 = 0% | 0 = 0% |
| | !$acc data region copy(a(:,:)) copyin(b(:,:)) | 0 = 0% | 4501 = 59% | 0 = 0% |
| | do iter = 1,niters | 0 = 0% | 0 = 0% | 0 = 0% |
| | !$acc region | 0 = 0% | 3161 = 41% | 0 = 0% |
| | do i = 2,n-1 | 0 = 0% | 0 = 0% | 0 = 0% |
| | do j = 2,m-1 | 0 = 0% | 0 = 0% | 2077 = 66% |
| | a(i,j) = w0 * b(i,j) + & | 0 = 0% | 0 = 0% | 0 = 0% | |
With pgi/13.1, PB NO MORE DATA/KERNEL COLUMN
| Code: |
pgfortran --version
pgfortran 13.1-1 64-bit target on x86-64 Linux -tp nehalem
pgfortran -g -o f3.exe f3.f90 -ta=nvidia -Minfo=accel,ccf,all -fast
...
pgcollect -time -cudainit f3.exe 5000
0 errors found
451390 microseconds on GPU
282197 microseconds on host
target process has terminated, writing profile data
pgprof -exe f3.exe
|
The sample spend 45139 ms on GPU but the PGPROG window show now :
| Code: | less pg131.txt
Profiled: ./f3.exe on Wed Feb 06 11:02:06 CET 2013
| Function | Seconds |
| __select_nocancel | 1,3678 = 48% |
| main | 8046 = 28% |
| __GI_sched_yield | 3678 = 13% |
| smoothhost | 2989 = 10% |
| sstk | 230 = 1% |
| __lll_lock_wait_private | 115 = 0% | |
=> No more smooth routine accelerate by acc directives , only the host one is shown ...
=> No more region/kernel timing
Rem :
activing the "pgcollect -cuda" option give some info on the gpu kernel generated by the compiler ...
but the profile obtained by this way is completely flatten and relation with the smooth source code is completely lost !
| Code: | less pg131_cuda.txt
Profiled: ./f3.exe on Wed Feb 06 11:11:31 CET 2013
| Function | Seconds | CUDA GPU Secs | CUDA CPU Secs |
| __select_nocancel | 1,3596 = 48% | 0 = 0% | 0 = 0% |
| main | 7865 = 28% | 0 = 0% | 0 = 0% |
| __GI_sched_yield | 3596 = 13% | 0 = 0% | 0 = 0% |
| smoothhost | 3146 = 11% | 0 = 0% | 0 = 0% |
| sstk | 225 = 1% | 0 = 0% | 0 = 0% |
| __lll_lock_wait_private | 112 = 0% | 0 = 0% | 0 = 0% |
| smooth_28_gpu | 0 = 0% | 2128 = 57% | 1 = 0% |
| smooth_35_gpu | 0 = 0% | 1051 = 28% | 0 = 0% |
| memcpyDtoHasync | 0 = 0% | 194 = 5% | 197 = 34% |
| memcpyHtoDasync | 0 = 0% | 374 = 10% | 374 = 65% | |
A+
Juan |
|
| Back to top |
|
 |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Fri Feb 08, 2013 1:31 am Post subject: |
|
|
Hello Don ...
No news ...
Could you check/reproduce the missigne time kernel/region problem on your side ?
A+
Juan |
|
| Back to top |
|
 |
donb
Joined: 20 Jul 2004 Posts: 82 Location: The Portland Group, Inc.
|
Posted: Fri Feb 08, 2013 12:30 pm Post subject: |
|
|
Juan,
Here is the story with accelerator profiling in 13.1 (and 13.2):
The accelerator runtime has been completely reorganized in 13.x. As part of that work the portion of the runtime that generates the accelerator profiler data has been reworked to allow users or other tool developers to add their own data collection facilities.
Unfortunately that work has not been finished and will not appear in a PGI release until 13.3 at the soonest. There is no workaround when using 13.1 or 13.2.
Obviously we had some communication and testing issues on our end or we would have been able to inform you better/sooner about this. I will work on addressing these issues right away.
--Don |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|