PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

nested openmp support in pgf90

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
TroelsH



Joined: 24 Mar 2010
Posts: 9

PostPosted: Mon Dec 17, 2012 7:31 pm    Post subject: nested openmp support in pgf90 Reply with quote

I have a large code that I am parallelizing with MPI + OpenMP + CUDA-Fortran. I am currently using v12.10 of pgf90, though other versions starting from 10.x have been used.

Different parts of the code is using nested openmp, and while it works fine with other compilers (ifort, gfortran, xlf), it seems that pgf90 refuses any nested openmp.

For example, here is a snippet illustrating the use of nested loop parallelization :

Code:

REAL FUNCTION wallclock()
  integer, save:: count(2), count_rate=0
  real, save:: norm, offset=0.
  if (count_rate == 0) then
    call system_clock(count=count(1), count_rate=count_rate)
    norm=1./real(count_rate)
  end if
  call system_clock(count=count(2))
  wallclock = (count(2)-count(1))*norm
  if (wallclock < 0.) then
    offset = offset + 24.*3600.
    wallclock = wallclock + 24.*3600.
  end if
END FUNCTION wallclock

Program Test_Nested_OpenMP
  implicit none
  integer, parameter     :: n=80000000
  integer                :: i, j
  integer, dimension(:,:), allocatable :: a, b
  real                   :: t0,t1,t2
  real, external         :: wallclock

  allocate(a(n,2), b(n,2))
  a=0; b=0
  t0 = wallclock()
  !$omp parallel do collapse(2)
  do j=1,2
  do i=1,n
    a(i,j)=sin(real(i+j))
  enddo
  enddo

  t1 = wallclock() 
  print *, 'Number of elements                  :', n
  print *, 'Time to initialize array            :', t1-t0 
  print *, '----------------------------------------------------'

  !$omp parallel do num_threads(2) shared(a,b) private(i,j)
  do j=1,2

    !$omp parallel shared(a,b,j) private(i)
    !$omp do
    do i=1,n
      b(i,j) = sin(real(i+j))
    enddo
    !$omp enddo nowait
    !$omp end parallel

  enddo
  !$omp end parallel do

  t2 = wallclock() 
  print *, 'Time to do nested region            :', t2-t1 
END


Compiling and Executing with :

Quote:

$ pgf90 -O2 -mp -Minfo test_nested_openmp.f90
test_nested_openmp:
26, Memory zero idiom, array assignment replaced by call to pgf90_mzero4
28, Parallel region activated
30, Parallel loop activated with static block schedule
33, Parallel region terminated
40, Parallel region activated
41, Parallel loop activated with static block schedule
43, Parallel region activated
47, Parallel region terminated
51, Parallel region terminated
$ env OMP_NUM_THREADS=4 OMP_MAX_ACTIVE_LEVELS=2 OMP_NESTED=true OMP_DYNAMC=true OMP_THREAD_LIMIT=4 taskset -c 0-3 ./a.out


I get

Quote:

Number of elements : 80000000
Time to initialize array : 2.112338
----------------------------------------------------
Time to do nested region : 3.674445


With other compilers (xlf, ifort, gfortran) the two times are equal. I have tried almost any variation of the OMP environment variables to no avail.

Is nested OpenMP not - or only partially - supported by the PGI compilers ?

best,

Troels
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Fri Dec 21, 2012 2:57 pm    Post subject: Reply with quote

Hi Troels,

Nested parallelism is support when the parallel regions are not lexically nested, i.e. the second comes inside a function call.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group