|
| View previous topic :: View next topic |
| Author |
Message |
paokara
Joined: 06 Feb 2011 Posts: 19
|
Posted: Wed Jan 16, 2013 7:28 am Post subject: N-body problem nested loop with OpenAcc |
|
|
Hello,
I have a problem with my code in N-body problem in calculation of accelerations.
Here is my code
!$acc kernels
!$acc loop
do i=2,nbod
axhli = 0
ayhli = 0
azhli = 0
!$acc loop
do j=2,nbod
*some calculations
axhl(j) = axhl(j) + something
ayhl(j) = ayhl(j) + something
azhl(j) = azhl(j) + something
axhli = axhli + something
ayhli = ayhli + something
azhli = azhli + something
enddo
axhl(i) = axhl(i) + axhli
ayhl(i) = ayhl(i) + ayhli
azhl(i) = azhl(i) + azhli
enddo
Of course I get "loop carried dependence of "axhl" prevents parallelization" error. I try to understand what i must do to achive 2D parallelizaton in my GPU with the correct results of course. If i use private clause i get 2D parallelization but i with wrong results
Thank you,
Sotiris |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Jan 16, 2013 10:56 am Post subject: |
|
|
Hi Sotiris,
This is a tough one since the algorithm isn't parallel. You might be able to create a 2-D temp array for each of the arrays, but you'd need to initialize them before the loop and then perform another reduction after. I'm not sure if the extra overhead would off-set any gains you achieve in paralleling the code. You'll need to experiment.
- Mat |
|
| Back to top |
|
 |
paokara
Joined: 06 Feb 2011 Posts: 19
|
Posted: Thu Jan 17, 2013 1:49 am Post subject: |
|
|
Thank you very much Mat for your quick reply. I want to ask you one more thing. My program is written in Fortran. Can I implement this double loop in cuda Fortran and then call this cuda function inside an OpenACC DATA REGION?
Thank you,
Sotiris |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Jan 17, 2013 10:16 am Post subject: |
|
|
Hi Sotiris,
| Quote: | | Can I implement this double loop in cuda Fortran and then call this cuda function inside an OpenACC DATA REGION? | In the more recent versions of the compiler, the compiler will favor the device copy of a variable contained within an OpenACC data region. So, yes, you can pass an OpenACC device data variable to a CUDA Fortran kernel from within an OpenACC data region.
The caveat being that this behaviour is non-standard. We're looking at using a call to "deviceptr" as a standard way to make it more explicit as to which copy of the variable to use. Though, this has not been implemented as of yet.
- Mat |
|
| Back to top |
|
 |
paokara
Joined: 06 Feb 2011 Posts: 19
|
Posted: Mon Jan 28, 2013 1:52 am Post subject: |
|
|
Hi Mat,
I want to ask you for the best way to program the following loop(OpenAcc)
| Code: |
do i=1,N
do j =1,M
A(i) = A(i) + B(i,j)
enddo
enddo
|
I get the correct results if i serialize the inner loop(with kernels construct).
| Code: |
!$acc kernel loop independent
do i=1,N
!$acc loop seq independent
do j =1,M
A(i) = A(i) + B(i,j)
enddo
enddo
|
but i don't know the way achieve 2D parallelization. Is that possible? Or is better to use 1D vectorization with parallel construct?
Also i want to you to tell me if it is possible to use a scalar variable instead of A(i) and use the reduction clause, and at after the end of the inner loop use this variable to give the correct value to A(i).
Thank you,
Sotiris |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|