PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI complier with OMP option

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
fishwater00



Joined: 14 Jun 2010
Posts: 3

PostPosted: Mon Jun 14, 2010 10:23 am    Post subject: PGI complier with OMP option Reply with quote

Hi All,

I am trying to accelerate my computation, here is what I have: GPU card (Quardo FX4800) installed on the 8cores CPU with 16G memory workstation.

Here is my question, if I have a loop such as:


Code:

do iy=1, NY
   do ix=1, NX
      do iz=1, NZ
           pxx = p(iz,ix-1,iy) + p(iz, ix+1, iy)
           pyy = p(iz,ix,iy-1) + p(iz, ix, iy+1)
           pzz = p(iz-1,ix,iy) + p(iz+1, ix, iy)
           der  = pxx + pyy + pzz
       enddo
    enddo
 enddo



how can I use !$acc region / !$acc end region

and !$OMP PARALLEL DO PRIVATE(iy,ix,iz, pxx, pyy,pzz) Schedule (dynamic)
!$OMP FIRSTPRIVATE(NX, NY, NZ)
/ !$ OMP END PARALLEL DO

together? I try to combine those two tips togehter to get better computational time.

Is it [possible to us egpi complier to do that?

Thanks.
Back to top
View user's profile
fishwater00



Joined: 14 Jun 2010
Posts: 3

PostPosted: Mon Jun 14, 2010 11:51 am    Post subject: Reply with quote

Is it my question clear?

I just want to see how to use pgi complier to combine GPU and OMP together. In this case, we do not need to waste any resources.

Thanks.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Mon Jun 14, 2010 3:58 pm    Post subject: Reply with quote

Hi fishwater00,

Quote:
Is it my question clear?

Not quite, so if my answer is unclear please let me know.


Assuming that you're planning on saving the results of "der" (such as into another 3-D array, not 'p'), then you should be able to just put the accelerator directives before and after the "iy" loop and the compiler will accelerate it. For performance on a GPU, you really want to have a lot of threads, 10's of thousands of threads. So for this loop, I would just use the accelerator model.

OpenMP can be combined with the PGI Accelerator model. However, at this time, it's not as easy as adding both directives. Instead, you need to first assign each thread to a GPU before entering an accelerator region and manually distribute the work to each thread.

The basic outline would be something like:
Code:

!$omp parallel private(ilo,ihi,i) num_threads(2)
  call acc_set_device(omp_get_thread_num())
  ilo = omp_get_thread_num()*(N+1)/2 + 1
  ihi = min(N,ilo+(N+1)/2)
  !$acc region do
  do i = ilo,ihi
    a(i) = b(i) + c(i)
  enddo
!$omp end region


Hope this helps,
Mat
Back to top
View user's profile
fishwater00



Joined: 14 Jun 2010
Posts: 3

PostPosted: Tue Jun 15, 2010 7:00 am    Post subject: Reply with quote

Thank you. That is what I want. It is very useful.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group