PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

compiles but crashes at run-time
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Jul 27, 2010 2:58 pm    Post subject: Reply with quote

Sorry about that. I was using our soon to be released 10.8 version, not 10.6. The crash you're seeing is due to a bug in 10.6. The work around (below) is to move the accelerator region into the body of the j loop.

- Mat

Code:
% cat particle.f90
program main
   real :: particles(20,3), garbage(20,3)
   real, parameter :: PI = 4 * atan(1.0)

!$acc data region local(garbage), copyout(particles)

!$acc region
      do j = 1, 20
         particles(j,1) = 4000
         particles(j,2) = 0
         particles(j,3) = 0
      end do
!$acc end region

      do j = 1, 10
!$acc region
         do jj = 1, 20
            garbage(jj,1) = particles(jj,1) + 0.002
            garbage(jj,2) = particles(jj,2) + 2 * PI
            garbage(jj,3) = (particles(jj,1) + 1) * sin(particles(jj,2) * 1000)
         end do
         do jj = 1, 20
            particles(jj,1) = garbage(jj,1)
            particles(jj,2) = garbage(jj,2)
            particles(jj,3) = garbage(jj,3)
         end do
!$acc end region
      enddo
!$acc end data region

print *, particles(1,1)

end program
% pgf90 -ta=nvidia,keepgpu -V10.6 -fast -Minfo=accel -o particle.out particle.f90 -Mkeepasm -Manno
main:
      6, Generating local(garbage(:,:))
         Generating copyout(particles(:,:))
      8, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
      9, Loop is parallelizable
         Accelerator kernel generated
          9, !$acc do parallel, vector(20)
             CC 1.0 : 4 registers; 20 shared, 16 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 4 registers; 20 shared, 16 constant, 0 local memory bytes; 25 occupancy
     18, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     19, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc do parallel, vector(20)
             Cached references to size [20x2] block of 'particles'
             CC 1.0 : 12 registers; 180 shared, 172 constant, 28 local memory bytes; 33 occupancy
             CC 1.3 : 12 registers; 180 shared, 172 constant, 28 local memory bytes; 25 occupancy
     24, Loop is parallelizable
         Accelerator kernel generated
         24, !$acc do parallel, vector(20)
             CC 1.0 : 5 registers; 20 shared, 72 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 5 registers; 20 shared, 72 constant, 0 local memory bytes; 25 occupancy
% particle.out
    4000.020

Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group