PGI User Forum

 N-body problem nested loop with OpenAcc Goto page 1, 2, 3  Next
Author Message
paokara

Joined: 06 Feb 2011
Posts: 24

 Posted: Wed Jan 16, 2013 7:28 am    Post subject: N-body problem nested loop with OpenAcc Hello, I have a problem with my code in N-body problem in calculation of accelerations. Here is my code !\$acc kernels !\$acc loop do i=2,nbod axhli = 0 ayhli = 0 azhli = 0 !\$acc loop do j=2,nbod *some calculations axhl(j) = axhl(j) + something ayhl(j) = ayhl(j) + something azhl(j) = azhl(j) + something axhli = axhli + something ayhli = ayhli + something azhli = azhli + something enddo axhl(i) = axhl(i) + axhli ayhl(i) = ayhl(i) + ayhli azhl(i) = azhl(i) + azhli enddo Of course I get "loop carried dependence of "axhl" prevents parallelization" error. I try to understand what i must do to achive 2D parallelizaton in my GPU with the correct results of course. If i use private clause i get 2D parallelization but i with wrong results Thank you, Sotiris
mkcolg

Joined: 30 Jun 2004
Posts: 6693
Location: The Portland Group Inc.

 Posted: Wed Jan 16, 2013 10:56 am    Post subject: Hi Sotiris, This is a tough one since the algorithm isn't parallel. You might be able to create a 2-D temp array for each of the arrays, but you'd need to initialize them before the loop and then perform another reduction after. I'm not sure if the extra overhead would off-set any gains you achieve in paralleling the code. You'll need to experiment. - Mat
paokara

Joined: 06 Feb 2011
Posts: 24

 Posted: Thu Jan 17, 2013 1:49 am    Post subject: Thank you very much Mat for your quick reply. I want to ask you one more thing. My program is written in Fortran. Can I implement this double loop in cuda Fortran and then call this cuda function inside an OpenACC DATA REGION? Thank you, Sotiris
mkcolg

Joined: 30 Jun 2004
Posts: 6693
Location: The Portland Group Inc.

Posted: Thu Jan 17, 2013 10:16 am    Post subject:

Hi Sotiris,

 Quote: Can I implement this double loop in cuda Fortran and then call this cuda function inside an OpenACC DATA REGION?
In the more recent versions of the compiler, the compiler will favor the device copy of a variable contained within an OpenACC data region. So, yes, you can pass an OpenACC device data variable to a CUDA Fortran kernel from within an OpenACC data region.

The caveat being that this behaviour is non-standard. We're looking at using a call to "deviceptr" as a standard way to make it more explicit as to which copy of the variable to use. Though, this has not been implemented as of yet.

- Mat
paokara

Joined: 06 Feb 2011
Posts: 24

Posted: Mon Jan 28, 2013 1:52 am    Post subject:

Hi Mat,

I want to ask you for the best way to program the following loop(OpenAcc)

 Code: do i=1,N  do j =1,M     A(i) = A(i) + B(i,j)  enddo enddo

I get the correct results if i serialize the inner loop(with kernels construct).
 Code: !\$acc kernel loop independent do i=1,N !\$acc loop seq independent  do j =1,M     A(i) = A(i) + B(i,j)  enddo enddo

but i don't know the way achieve 2D parallelization. Is that possible? Or is better to use 1D vectorization with parallel construct?
Also i want to you to tell me if it is possible to use a scalar variable instead of A(i) and use the reduction clause, and at after the end of the inner loop use this variable to give the correct value to A(i).

Thank you,
Sotiris
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT - 7 HoursGoto page 1, 2, 3  Next Page 1 of 3

 Jump to: Select a forum General Information----------------New Release Announcements User Forums----------------Programming and CompilingAccelerator ProgrammingDebugging and ProfilingLicenses and Installation
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum