PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Fail to launch OpenMP in PVF

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
catfishwolf



Joined: 31 Mar 2013
Posts: 8

PostPosted: Sun Jul 28, 2013 3:18 pm    Post subject: Fail to launch OpenMP in PVF Reply with quote

Hi everyone,

I use the following source code to test the performance of OpenMP parallel computing in two compilers---PGI Visual Fortran (PVF) and Intel Visual Fortran (IVF)---on the same desktop (with AMD quad core CPU FX-4130).

I found a disturbing outcome saying that my PVF setting failed to fully launch four thread computing. Can anyone help me figure out how to get my PVF OpenMP setting right, and why PVF spent far more time in completing both the sequential and parallel calculation (controlling for the factor of OpenMP failure)?

Below I list the source code I used for this experiment, the result obtained with IVF and PVF in order. In the end I list the corresponding command line flags shown in the visual studio property pages for PVF and IVF respectively.

Thanks,
Li

Edit: At the end of the day, I got to know how to fully launch 4 threads with PVF compiler. (The PVF outcome is updated below.) However, the result remained incomparable with that generated by IVF compiler. I am wondering if it is the innate flaw of PVF compiler.

Code:
program main
  use omp_lib
  implicit none
  integer :: stime,etime,k=10000000
  integer :: i,thread_id, tid, nthreads
  integer, allocatable :: x(:),y(:),z(:)
  allocate(x(k),y(k),z(k))
 
  write(*,*) '---- Sequential section ----'
  call system_clock(stime)
  do i=1,k,1
    x(i)=2*i;
    y(i)=i;
    z(i)=x(i)+y(i);
  enddo
  call system_clock(etime)
  write(*,*) 'Sequential elapsed time: ', etime-stime, 'microseconds'

 
  write(*,*) '---- OpenMP section ----'
  !$omp parallel private(thread_id)
  thread_id = omp_get_thread_num()
  write(*,*) 'Thread ', thread_id, ': Hello.'
  !$OMP BARRIER
  write(*,*) 'Thread ', thread_id, ': Bye bye.'
  !$omp end parallel
 
  !$omp parallel private(tid)
  tid = omp_get_thread_num()
  nthreads = omp_get_num_threads()
  write(*,*) 'Threads = ', nthreads 
  !$OMP BARRIER
  call system_clock(stime)
  !$omp do 
    do i =1,k,1
      x(i)=i
      y(i)=2*i
      z(i)=x(i)+y(i)
    enddo
  !$omp end do
  !$omp end parallel
  call system_clock(etime)
  write(*,*) 'OpenMP elapsed time:', etime-stime, 'microseconds'
 
end program main


The results from PVF:
Code:

 ---- Sequential section ----
 ---- Sequential section ----
 Sequential elapsed time:         62000 microseconds
 ---- OpenMP section ----
 Thread             0 : Hello.
 Thread             2 : Hello.
 Thread             3 : Hello.
 Thread             1 : Hello.
 Thread             1 : Bye bye.
 Thread             0 : Bye bye.
 Thread             2 : Bye bye.
 Thread             3 : Bye bye.
 Threads =             4
 Threads =             4
 Threads =             4
 Threads =             4
 OpenMP elapsed time:        16000 microseconds

The results from IVF:
Code:
 ---- Sequential section ----
 Sequential elapsed time:          580 microseconds
 ---- OpenMP section ----
 Thread            2 : Hello.
 Thread            1 : Hello.
 Thread            0 : Hello.
 Thread            3 : Hello.
 Thread            0 : Bye bye.
 Thread            2 : Bye bye.
 Thread            3 : Bye bye.
 Thread            1 : Bye bye.
 Threads =            4
 Threads =            4
 Threads =            4
 Threads =            4
 OpenMP elapsed time:          60 microseconds


The flags I used for compiling:
Code:
IVF: /nologo /O3 /Qopenmp /module:"Release\\" /object:"Release\\" /Fd"Release\vc100.pdb" /libs:static /threads /c

Code:
PVF: -g -Bstatic -Mbackslash -mp -fastsse -Mipa=fast,inline -O3 -Mvect=simd:256 -Minline -Mframe -Munroll=n:4 -Mconcur -Knoieee -Minform=warn -Minfo=mp 
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 317
Location: Greenbelt, MD

PostPosted: Mon Jul 29, 2013 6:05 am    Post subject: Reply with quote

I think the main issue might be that you are assuming system_clock returns microseconds. The standard doesn't specify what it has to be.

For example, on my Linux box, with ifort, if you run:
Code:
call system_clock(count_rate=clock_rate)
it returns 1000000. On PGI, it returns 10000000. My guess is on your box, Intel's clock rate is 100x lower than PGI (assuming the sequential time is roughly the same).

Thus, to be consistent, you have to know the rate and divide by it:
Code:
integer :: clock_start,clock_end,clock_rate
real :: elapsed_time
...
call system_clock(count_rate=clock_rate)
call system_clock(count=clock_start)
...
call system_clock(count=clock_end)
elapsed_time = real((clock_end-clock_start)/clock_rate)

I'm pretty sure this sequence puts elapsed_time in seconds, so if you want milliseconds, say, you'll need to multiply by 1000.

Of course, if it this doesn't make things look consistent...then there's a problem!

Hope this helps,
Matt
Back to top
View user's profile
catfishwolf



Joined: 31 Mar 2013
Posts: 8

PostPosted: Mon Jul 29, 2013 7:34 am    Post subject: Reply with quote

TheMatt wrote:
I think the main issue might be that you are assuming system_clock returns microseconds. The standard doesn't specify what it has to be.


Thank you for the answer, Matt. The new comparison I made today is shown below. It is just as what you said. The clock rate of IVF is 100x slower than that of PVF. Based on the new results in the release mode, PVF is comparable with IVF, at least in sequential part.

Li

PVF:
Code:
 clock_rate      1000000
 ---- Sequential section ----
 Sequential elapsed time:         68000
 ---- OpenMP section ----
 Thread             0 : Hello.
 Thread             3 : Hello.
 Thread             2 : Hello.
 Thread             1 : Hello.
 Thread             1 : Bye bye.
 Thread             2 : Bye bye.
 Thread             0 : Bye bye.
 Thread             3 : Bye bye.
 Threads =             4
 Threads =             4
 Threads =             4
 Threads =             4
 OpenMP elapsed time:        16000


IVF:
Code:

 clock_rate       10000
 ---- Sequential section ----
 Sequential elapsed time:          660
 ---- OpenMP section ----
 Thread            0 : Hello.
 Thread            1 : Hello.
 Thread            2 : Hello.
 Thread            3 : Hello.
 Thread            1 : Bye bye.
 Thread            0 : Bye bye.
 Thread            2 : Bye bye.
 Thread            3 : Bye bye.
 Threads =            4
 Threads =            4
 Threads =            4
 Threads =            4
 OpenMP elapsed time:          70
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group