PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

To use atomic add
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
tomko



Joined: 23 Sep 2011
Posts: 10

PostPosted: Tue Nov 01, 2011 3:17 pm    Post subject: Reply with quote

Do you know with which compiler version that support was added. I get the following error message when I try to use atomicadd() with reals.

PGF90-S-0155-Could not resolve generic procedure atomicadd

My code looks something like
tmpz = A(k,n,j)*C(l,m)
tmpf = atomicadd(sB_real(i,j),REAL(tmpz))
tmpf = atomicadd(sB_img(i,j),AIMAG(tmpz))
where tmpz, A and C are complex(kind=8) and sB_real, sB_img, and tmpf are real(kind=8). sB_real, sB_img are in shared memory, tmpz and tmpf are thread local.

You might notice that what I'd really like is atomicadd for complex, but handling the real and imaginary part separately should produce a correct result for addition. Multiple threads will contribute to the same sB array location, hence the need for atomic.

Karen
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Nov 01, 2011 4:05 pm    Post subject: Reply with quote

Hi Karen,

I've forgotten the exact release, but it was an early 11.x. The problem here is that atomicadd only supports single precision. Unfortunately, I don't know if/when NVIDIA will add hardware support for double precision atomics.

- Mat
Back to top
View user's profile
tomko



Joined: 23 Sep 2011
Posts: 10

PostPosted: Tue Nov 01, 2011 4:27 pm    Post subject: Reply with quote

Thanks, I'll try another approach.
-Karen
Back to top
View user's profile
Mike_Texnik



Joined: 10 Oct 2010
Posts: 16

PostPosted: Sun Jun 24, 2012 11:44 am    Post subject: Reply with quote

I have a simple code that should compute a distribution function, but unfortunately it doesn't work, i suppose that problem is in implementation of atomicadd.
Code:

    attributes(global) subroutine stat_kernel(x,dist,Na,Nbin,dx,nTr)
    integer(4),device :: dist(Nbin)
   real(4),device:: x(Na)
    real(4),value::dx
    integer, value :: Na, nTr,Nbin
    integer :: i, j, tx,ij
   call syncthreads()
   if (blockidx%x.eq.1) then
   tx=threadidx%x+1
    do i=Na*(tx-1)/nTr+1,Na*tx/nTr
      ij=int(X(i)/dx)+1
      ic=atomicadd(dist(ij),1)
   enddo
   endif
    end subroutine stat_kernel


the compilation proceeds fine, but when i execute it, the following error appears:
copyout MemCpy (host=0x4011a1e0, dev=0x8c00000, size=40) Failed :30(unknown error)

what am i doing wrong?
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Mon Jun 25, 2012 3:50 pm    Post subject: Reply with quote

Can you send/post the main program that calls this CUDA Fortran kernel and causes the program to crash as you showed below. Without being able to reproduce the issue, its hard to say what the problem is.

Thanks.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group