PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

pgprof/pgcollect : problem with CPU+openacc on same routine

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Mon Oct 13, 2014 3:05 am    Post subject: pgprof/pgcollect : problem with CPU+openacc on same routine Reply with quote

Hello .
New feature problem when profiling with pgprof/pgcollect tested on pgi/13.10 ( because pgi14/X doesn't work at all )

When profiling a routine containing CPU + OpenAcc computation , with ppgprof this is reported as 2 separate routines ?

-> 1 having only CPU time
-> the other one only with GPU ( kernel/region ) time

A simple example showing this problem :

Code:

MODULE mode_sub
CONTAINS
  SUBROUTINE sub(X,Y,N)
    IMPLICIT NONE
    INTEGER :: N
    REAL    :: X(N) , Y(N)

    !$acc kernels present(X)
    X = X * 0.9995
    !$acc end kernels

    ! CPU part
    Y = Y * 1.0001

  END SUBROUTINE sub 
END MODULE mode_sub

PROGRAM test_pgcollect

  USE mode_sub
  IMPLICIT NONE
  INTEGER, PARAMETER :: N=134217728, NIT=100
   
  REAL, ALLOCATABLE  :: A(:) , B(:)
  !$acc declare create(A)

  INTEGER            :: I
   
  ALLOCATE(A(N),B(N))
  !$acc kernels
  A = 1.2345
  !$acc end kernels
  B = 2.3456         

  DO I=1,NIT
     CALL SUB(A,B,N)
  END DO

  !$acc update host (A(N:N))
  print*,"A(N)=",A(N)," B(N)=",B(N)

END PROGRAM test_pgcollect


Quote:

> pgf90 --version
pgf90 13.10-0 64-bit target on x86-64 Linux -tp nehalem
...

> pgf90 -ta=nvidia:cuda5.5 openacc_pgcollect_mix.f90 -o openacc_pgcollect_mix_1310
...

> pgcollect ./openacc_pgcollect_mix_1310
Profiling single-threaded target program
A(N)= 1.174275 B(N)= 2.369176
target process has terminated, writing profile data
...

> pgprof # -> write the main window to file



> cat prof_mix
Code:

Profiled: ./openacc_pgcollect_mix_1310 on Mon Oct 13 11:51:53 CEST 2014

| Function       | Seconds         | Region Time     | Kernel Device Time |

| sub            |  17,495 =  96%  |       0 =   0%  |       0 =   0%     |
| [System_Time]  |     577 =   3%  |       0 =   0%  |       0 =   0%     |
| __c_mset4      |     124 =   1%  |       0 =   0%  |       0 =   0%     |
| test_pgcollect |       0 =   0%  |      28 =   1%  |      28 =   1%     |
| sub            |       0 =   0%  |    4853 =  99%  |    4853 =  99%     |



The detail window show the same separate view , CPU only in one view and only GPU on the other one .

This problem is very painful for the analysis of porting a very big code ( Million of lines ... )

:-) Thanks if it could be fixed for the next 14/15.X release ...

Bye Juan
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 438

PostPosted: Tue Nov 18, 2014 5:50 pm    Post subject: Reply with quote

I ran your example with 13.10, 14.10, and the upcoming 15.0 (you will
see 15.1 in January).

I think 15.0 is working better than 14.10. We filed TPR 20904 for this problem and it should be in the 15.1 release.

13.10
Profiled: ./test_pgcollect on Tue Nov 18 16:33:44 PST 2014

| Function | Seconds | Region Time | Kernel Device Time |

| ppoll | 19.488 = 50% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 18.050 = 46% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.825 = 2% | 0.0000 = 0% | 0.0000 = 0% |
| __GI_sched_yield | 0.200 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.175 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.150 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __write | 0.050 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.050 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.025 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.012 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.8033 = 100% | 0.8033 = 100% |
| test_pgcollect | 0.000 = 0% | 0.0038 = 0% | 0.0038 = 0% |


================================================

14.10

=============================================
Profiled: ./test_pgcollect on Tue Nov 18 16:31:07 PST 2014

| Function | Seconds | Region Time | Kernel Device Time |

| __poll | 17.966 = 50% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 16.690 = 46% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.690 = 2% | 0.0000 = 0% | 0.0000 = 0% |
| sched_getscheduler | 0.253 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.115 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.115 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.046 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sem_timedwait | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open64 | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| fchmodat | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.011 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.0002 = 56% | 0.0002 = 56% |
| test_pgcollect | 0.000 = 0% | 0.0002 = 44% | 0.0002 = 44% |


==============================================
15.0

==============================================

Profiled: ./test_pgcollect on Tue Nov 18 13:51:21 PST 2014

| Function | Seconds | Region Time | Kernel Device Time |

| __poll | 19.101 = 33% | 0.0000 = 0% | 0.0000 = 0% |
| sem_timedwait | 18.868 = 33% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 17.535 = 31% | 0.0000 = 0% | 0.0000 = 0% |
| [System_Time] | 0.651 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| sched_getscheduler | 0.488 = 1% | 0.0000 = 0% | 0.0000 = 0% |
| __c_mset4 | 0.116 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sstk | 0.101 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _dl_debug_initialize | 0.047 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| _int_malloc | 0.047 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| read | 0.031 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open64 | 0.031 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __open_nocancel | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| __lll_lock_wait_private | 0.023 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| memset | 0.016 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sighandler_setxid | 0.008 = 0% | 0.0000 = 0% | 0.0000 = 0% |
| sub | 0.000 = 0% | 0.8011 = 100% | 0.8011 = 100% |
| test_pgcollect | 0.000 = 0% | 0.0036 = 0% | 0.0036 = 0% |
Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Mon Nov 24, 2014 6:39 am    Post subject: Reply with quote

:-) Thank you Dave

Bye
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group