PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Nested parallelism using the ACML

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
CasperK



Joined: 12 Dec 2006
Posts: 32

PostPosted: Fri Apr 12, 2013 5:29 am    Post subject: Nested parallelism using the ACML Reply with quote

Hi

I am writing a code for use on a quad socket Opteron 6100 system, where I want to exploit nested parallelism in order to utilize the full bandwidth of the ccNUMA architecture. The idea that I want to implement is shown below, but it fails utilizing nested parallelism in calls to the ACML. If I remove the parallel region of the example, the ACML call itself will utilize multiple CPU's, but as soon as I add the outer parallel region it starts to run single threaded. What do I have to do to extract parallelism from both places at the same time?

I am using PVF 13.2 with VS2010, executing on Windows 2008 R2.

Best regards,

Casper

program prog
implicit none
integer :: i,j,NRHS,LDB,N=1000,M=200
Complex*16 :: A(N,N),B(N,1)
integer :: iPiv(N)
integer :: info
call omp_set_nested(1)
call omp_set_dynamic(1)
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(N) NUM_THREADS(8)
!$OMP DO
!We want to parallelize a program that does a boatload (M) of succesive calls to ZGETRF
do j=1,M
!Fill a dummy matrix for the example
A(:,:)=0d0
do i=1,N
A(i,i)=1d0
end do
!I want to use 6 threads for each of the acml calls - eg. 8x6 threads in total
!But i cannot get any parallelism out of the following ACML call with the outer parallel region enabled.
call omp_set_num_threads(6)
CALL ZGETRF( N, N, A, N, IPIV, INFO )
end do
!$OMP END DO
!$OMP END PARALLEL

end program prog
Back to top
View user's profile
CasperK



Joined: 12 Dec 2006
Posts: 32

PostPosted: Tue Apr 16, 2013 2:26 am    Post subject: Reply with quote

No suggestions? :-(
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 4996
Location: The Portland Group Inc.

PostPosted: Tue Apr 16, 2013 7:44 am    Post subject: Reply with quote

Hi Casper,

We're not sure if AMD's ACML supports nested parallelism. Though, in addition to setting OMP_NESTED, you may need to set the environment variable "OMP_MAX_ACTIVE_LEVELS=2" as well. Give that a try.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2002 phpBB Group