PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Problem interpreting -ta=time output

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Alistair Hart



Joined: 06 Jul 2010
Posts: 21
Location: Cray Exascale Research Initiative, Edinburgh

PostPosted: Tue Sep 07, 2010 8:52 am    Post subject: Problem interpreting -ta=time output Reply with quote

Hi,

I am having trouble understanding the output of the -ta=time option for pgfortran.

The output is:
Code:

    586: region entered 8 times
        time(us): total=383179 init=12 region=383167
                  kernels=28369 data=153875
        w/o init: total=383167 max=165873 min=4553 avg=47895
        595: kernel launched 8 times
            grid: [1-64x1-64]  block: [16x4x4]
            time(us): total=14036 max=12196 min=12 avg=1754
        601: kernel launched 8 times
            grid: [1-64x1-64]  block: [16x4x4]
            time(us): total=14333 max=12426 min=10 avg=1791


I can see that "region"="total"-"init", but "kernels"+"data" is nothing like "region". Where is the rest of the time going, if not on "kernels" or "data"?

Cheers,

Alistair.
Back to top
View user's profile
hongyon



Joined: 19 Jul 2004
Posts: 551

PostPosted: Tue Sep 07, 2010 4:12 pm    Post subject: Reply with quote

Hi,

I would try some experiments.

1) If the host you are running has multiple CPUs, I would use you preference method to bind to one particular core to prevent the program jumping around.

2) There might be some granularity issue. The region time is measured from the host. The kernel and data times are from the device. I would try to make the test run longer to see if the difference is smaller.

Hongyon
Back to top
View user's profile
Alistair Hart



Joined: 06 Jul 2010
Posts: 21
Location: Cray Exascale Research Initiative, Edinburgh

PostPosted: Wed Sep 08, 2010 4:29 am    Post subject: Reply with quote

Thanks. What I am trying to do is show how fast the code would go if I could cut out the data transfer (i.e. make the data resident on the accelerator between subroutine calls). What would be the best number to quote: summed "kernels" times or total program time minus summed "data" times, or something else?

In response to your suggestions, I used "taskset" on the Linux host to bind the process. I also made sure it was not using the OpenMP directives in the code. There was still a discrepancy.

Regarding granularity, another kernel in the same code had these stats:

Code:

    1227: region entered 4 times
        time(us): total=363636 init=6 region=363630
                  kernels=13323 data=182556


These times are thousands of microseconds per call. This would be very poor granularity.

Cheers,

Alistair.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Sep 08, 2010 5:38 pm    Post subject: Reply with quote

Hi Alistair,

I find this odd as well. I've tried my best to recreate the mismatch times here, but as of yet no luck. The profiling information is collected via calls to the CUDA driver so there could be a problem in how this information is collected or it could be something specific to your system. If you're able to share the code, I'd be interested in seeing if I can recreate the problem here.

Another tack would be to not use "-ta=time" and instead use the CUDA profiler. Full documetnation can be found HERE but the simplest method is to set the following environment variables and then run your program.

Code:
    setenv CUDA_PROFILE 1
    setenv CUDA_PROFILE_CSV 1
    setenv CUDA_PROFILE_LOG myProfieFileName.csv.%d


You can change CUDA_PROFILE_LOG to what ever file name you wish. The "%d" refers the device used. If multiple devices are used, then multiple profiles will be created.

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group