PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

N-body problem nested loop with OpenAcc
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Jan 28, 2013 12:54 pm    Post subject: Reply with quote

Hi Sotiris,

To achieve 2-D parallelism here, you either need to make "A" 2-D or add a reduction.

Code:
!$acc kernel loop independent
do i=1,N
sum = 0.0
!$acc loop reduction (+:sum)
 do j =1,M
    sum = sum + B(i,j)
 enddo
A(i) = A(i) + sum
enddo

The caveat being that there is overhead in performing a reduction and an inner loop reduction limits the schedules that can be used (the inner loop can only be a "vector" and the outer loop must be a "gang"). So unless your inner loop has enough computation to offset this overhead, you may be better off just accelerating the outer loop. You'll need to experiment as to which method works best for your particular code.

- Mat
Back to top
View user's profile
paokara



Joined: 06 Feb 2011
Posts: 24

PostPosted: Mon Mar 18, 2013 12:26 pm    Post subject: Reply with quote

Hello Mat,

With have a problem with our code and we can't figure out the reason. We work on N-body problem and we've changed some of our FORTRAN functions using some OpenACC directives. We have only one "heavy" part, the part we calculate the accelerations( a 2D loop) with the way with have discussed in this topic. Also we are using a Tesla C1060.
Our implementation works fine with a small number of bodies but when we use a large number for our system (for example 10.000*10.000 for the 2-D in accelaration part) we get NaN in our results. We believe that this is a memory issue. Is that possible?

Another important issue is that we are using double precision variables for our code. Tesla C1060 has only 1/10 of the cores for double precision calculations. Is it possible that the device can't execute all the double precision calculations and give us NaN?

Thanks,
Sotiris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Mar 18, 2013 12:54 pm    Post subject: Reply with quote

Quote:
We believe that this is a memory issue. Is that possible?
If you were running out of memory, the binary would abort execution. The C1060's don't have ECC memory so it could still be memory related, but I doubt it.

Try adding "-Mlarge_arrays" to your compilation. It's possible that the index calculations need to be adjusted. Granted 10k x 10k isn't that large so this may not be the issue.

What does the -Minfo messages say about how the loop is being scheduled?

Quote:
Another important issue is that we are using double precision variables for our code. Tesla C1060 has only 1/10 of the cores for double precision calculations. Is it possible that the device can't execute all the double precision calculations and give us NaN?
Doubtful. This just will slow you down, but shouldn't give you NaNs.

The other thing I'd look for is overflows. You might have an integer*4 variable that needs to be integer*8 or real*4 needs to be real*8. Try adding the flags "-i8 -r8" to change the default kind to the larger data types to see if that helps.

If not, then I'll need to see a reproducing example to determine the issue.

- Mat
Back to top
View user's profile
paokara



Joined: 06 Feb 2011
Posts: 24

PostPosted: Tue Mar 19, 2013 3:23 pm    Post subject: Reply with quote

Hi Mat and thanks for the quick reply,

Today we finally figure out what the problem was.
To avoid useless calculations we use 2 "if" statements inside our openACC regions. We have one DATA region and every N steps we copy our data back to the host. The first step of those N steps we must do the calculations inside those two IF statements i mentioned above. But in the remaining N-1 steps we don't want this calculations. The first if statement is inside a PARALLEL region and the second IF statement is inside a KERNEL region. Here is the structrure of our code

Code:

!$acc data copy ARRAYS

do while (for N steps)

!$acc parallel
    if (flag1) then
     CALCULATIONS that i want only for the first step
    endif

    MORE CALCULATIONS
!$acc end parallel


!$acc kernels
    if (flag2) then
     CALCULATIONS that i want only for the first step
    endif
!$acc end kernels

!$acc parallel
    MORE CALCULATIONS
!$acc end parallel


enddo
!$acc end data




If we use those IF statements and large ARRAYS we get NaN as results.(If we don't use them we get the correct results but 2X slower execution).


Can you guess why is that happening? Is it IFLAG1 and IFLAG2 the problem? I mean that it is possible that all the threads of the grid do not see the correct values of these scalar variables.

Thanks,
Sotiris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Mar 20, 2013 9:39 am    Post subject: Reply with quote

Hi Sotiris,

By using the "parallel" construct, you are defining that everything in this region will be moved over the device. Also, it's work shared, so unless you define using the "loop" directive to tell the compiler how you want the work divided, all thread in the region will execute the same code. With the "kerenl" construct, the compiler figures out the best way to divide up the work. (FYI, this article may be useful http://www.pgroup.com/lit/articles/insider/v4n2a1.htm in understanding the differences between the two).

Without seeing the code, I can't be sure if this is the source of the NaNs, but it is possible. Do you get wrong answers if you switch to using just "kernels"?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group