PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Fortran compilation problem.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
tiomiya



Joined: 03 Dec 2009
Posts: 6

PostPosted: Thu Mar 18, 2010 3:45 am    Post subject: Fortran compilation problem. Reply with quote

I'm trying to make our fortran program to be accelerated.
And I got following compiling massages with no executable file.
I tried to search for this, but could get any clue.
Sorry for making such a blur question, but I don't know how to fix this.

> > pgfortran memb.iot.para.f -ta=nvidia,time -Minfo
accelar:
1168, Generating copyin(zj(:))
Generating copy(fzvm(:,1:iatom-1))
Generating copy(fyvm(:,1:iatom-1))
Generating copy(fxvm(:,1:iatom-1))
Generating copy(vvdwj(:,1:iatom-1))
Generating copyin(rvdwj(:))
Generating copyin(evdwj(:))
Generating copyin(nclosej(1:iatom-1))
Generating copyin(linkj(1:iatom-1,:))
Generating copyin(n13j(1:iatom-1))
Generating copyin(xj(:))
Generating copyin(yj(:))
Generating copyin(index3j(1:iatom-1,:))
1169, Accelerator kernel generated
1169, !$acc do parallel
Non-stride-1 accesses for array 'evdwj'
Non-stride-1 accesses for array 'rvdwj'
Non-stride-1 accesses for array 'zj'
Non-stride-1 accesses for array 'yj'
Non-stride-1 accesses for array 'xj'
Non-stride-1 accesses for array 'n13j'
Non-stride-1 accesses for array 'nclosej'
1170, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1173, Accelerator restriction: induction variable live-out from loop: i
Inner sequential loop scheduled on accelerator
1174, Accelerator restriction: induction variable live-out from loop: i
1175, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: scalar variable live-out from loop: joke
Loop carried scalar dependence for 'joke' at line 1175
Scalar last value needed after loop for 'joke' at line 1180
Scalar last value needed after loop for 'joke' at line 1183
1178, Accelerator restriction: induction variable live-out from loop: i
Inner sequential loop scheduled on accelerator
1179, Accelerator restriction: induction variable live-out from loop: i
1180, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: scalar variable live-out from loop: joke
Loop carried scalar dependence for 'joke' at line 1180
Scalar last value needed after loop for 'joke' at line 1183
1185, Accelerator restriction: scalar variable live-out from loop: cute
1186, Accelerator restriction: scalar variable live-out from loop: cut1
1189, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1191, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1193, Accelerator restriction: induction variable live-out from loop: j
Accelerator restriction: induction variable live-out from loop: i
1196, Accelerator restriction: scalar variable live-out from loop: rr
1200, Accelerator restriction: induction variable live-out from loop: i
1201, Accelerator restriction: induction variable live-out from loop: j
1203, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1205, Loop carried scalar dependence for 'sw0' at line 1228
Loop carried scalar dependence for 'sw0' at line 1230
Loop carried scalar dependence for 'sw0' at line 1231
Loop carried scalar dependence for 'sw0' at line 1232
1206, Loop carried scalar dependence for 'sw0' at line 1228
Loop carried scalar dependence for 'sw0' at line 1230
Loop carried scalar dependence for 'sw0' at line 1231
Loop carried scalar dependence for 'sw0' at line 1232
1208, Loop carried scalar dependence for 'sw0' at line 1228
Loop carried scalar dependence for 'sw0' at line 1230
Loop carried scalar dependence for 'sw0' at line 1231
Loop carried scalar dependence for 'sw0' at line 1232
1212, Loop carried scalar dependence for 'dsw' at line 1230
Loop carried scalar dependence for 'dsw' at line 1231
Loop carried scalar dependence for 'dsw' at line 1232
1213, Loop carried scalar dependence for 'dsw' at line 1230
Loop carried scalar dependence for 'dsw' at line 1231
Loop carried scalar dependence for 'dsw' at line 1232
1215, Loop carried scalar dependence for 'dsw' at line 1230
Loop carried scalar dependence for 'dsw' at line 1231
Loop carried scalar dependence for 'dsw' at line 1232
1228, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1230, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1231, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1232, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: j
1281, Accelerator restriction: induction variable live-out from loop: j
1282, Accelerator restriction: induction variable live-out from loop: i
1286, Invariant assignments hoisted out of loop
1356, Accelerator restriction: induction variable live-out from loop: i
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_myprocnum':
initpar.c:(.text+0x2): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_ncpus':
initpar.c:(.text+0x12): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_tcpus' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_getioproc':
initpar.c:(.text+0x22): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_ioproc' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_is_ioproc':
initpar.c:(.text+0x32): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_ioproc' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
initpar.c:(.text+0x38): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_abort':
initpar.c:(.text+0x5f): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_abortp':
initpar.c:(.text+0xeb): relocation truncated to fit: R_X86_64_PC32 against symbol `__hpf_lcpu' defined in COMMON section in /opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o)
/opt/pgi/linux86-64/10.0/lib/libpgf90.a(initpar.o): In function `__hpf_initarg':
initpar.c:(.text+0x127): relocation truncated to fit: R_X86_64_PC32 against `.bss'
initpar.c:(.text+0x151): relocation truncated to fit: R_X86_64_PC32 against `.bss'
initpar.c:(.text+0x17b): relocation truncated to fit: R_X86_64_PC32 against `.bss'
initpar.c:(.text+0x18b): additional relocation overflows omitted from the output

The following lines are the code in acc region.
Code:
      call acc_init( acc_device_nvidia )
!$acc region do parallel, vector(256)
   do i = 1, iatom - 1
   do j = i + 1, iatom

      joke = 0
      do k = 1, nclosej(i)
      k1 = linkj(i,k)
      if( j.eq.k1 ) joke = 1
      enddo

      do k = 1, n13j(i)
      k1 = index3j(i,k)
      if( j.eq.k1 ) joke = 1
      enddo

      if( joke.eq.0 ) then

      cute  = 20.0/sunit
   cut1  = 0.98*cute
     econs = 332.0/(eunit*sunit)

   xx = (xj(i) - xj(j)) -
     s   float(int( (xj(i)-xj(j))*2.0/boxa) ) *boxa
   yy = (yj(i) - yj(j)) -
     s   float(int( (yj(i)-yj(j))*2.0/boxb) ) *boxb
   zz = (zj(i) - zj(j)) -
     s  float(int( (zj(i)-zj(j))*2.0/boxc) ) *boxc
   
   rr = sqrt( xx**2 + yy**2 + zz**2 )
    if( rr.le.cutoff*1.01 ) then


   rii = rvdwj(i)
      rjj = rvdwj(j)
   rij = (rii + rjj)/2.0
   eij = sqrt( evdwj(i)*evdwj(j) )

   if(rr.lt.cutoff) sw0=1.0
   if(rr.gt.cutoff*1.01) sw0=0.0
   if(rr.ge.cutoff.and.rr.le.cutoff*1.01) then
   sw0=(cutoff*1.01+2.0*rr-3.0*cutoff)
     s         *(cutoff*1.01-rr)**2/((0.01*cutoff)**3)
   endif
      
   if(rr.lt.cutoff) dsw=0.0
   if(rr.gt.cutoff*1.01) dsw=0.0
   if(rr.ge.cutoff.and.rr.le.cutoff*1.01) then
   dsw=6.0*(cutoff-rr)*(cutoff*1.01-rr)/((0.01*cutoff)**3)
   endif

   aij = rij**6
   bij = rij**3
      rr6 = rr**6
      rr12 = rr6 **2
      rr13 = rr12 * rr
      rr7 = rr6 * rr
   vpart = 4.0*eij*(aij/(rr12) - bij/(rr6))
   dpart = 4.0*eij*( -12.0*aij/(rr13) + 6.0*bij/(rr7) )

   vdwij = vpart*sw0
   vvdwj(j,i)  = vpart *sw0
C
   fxvm(j,i) = -( dpart*sw0 + vpart*dsw )*(xx/rr)
   fyvm(j,i) = -( dpart*sw0 + vpart*dsw )*(yy/rr)
   fzvm(j,i) = -( dpart*sw0 + vpart*dsw )*(zz/rr)

   ! ax(i1) = ax(i1) + fxv
   ! ay(i1) = ay(i1) + fyv
   ! az(i1) = az(i1) + fzv
   !   ax(j1) = ax(j1) - fxv
   !   ay(j1) = ay(j1) - fyv
   !   az(j1) = az(j1) - fzv

      endif

   enddo   
   enddo
!$acc end region
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Thu Mar 18, 2010 11:37 am    Post subject: Reply with quote

Hi tiomiya,

I see a few issues.

First, I would recommend using version 10.1 or higher. 10.0 unfortunately has a bug where if statements were being ignored. It's not the cause of these errors, but will effect your code once it compiles.

Secondly, you're using a triangular loop. GPUs only support rectangular loops. Hence you will need to either make the "j" loop sequential, or make the "j" loop rectangular and then use an if statement to skip the lower part of the triangle.

For example:
Code:

!$acc region
!$acc do kernel
   do i = 1, iatom - 1
   do j = i + 1, iatom
...

or
Code:
!$acc region
!$acc do parallel, vector(256)
   do i = 1, iatom - 1
!$acc do kernel
   do j = 2, iatom
     if (j.gt.i) then
      ... body of loop
     endif


For the "Loop carried scalar dependence" and "induction variable live-out", errors I believe these should go away once the "j" is parallelizable or made sequential. Granted, I can't be sure since there isn't enough of the code for me to be able to compile it.

For the "relocation truncated" errors, does this code compile as is without the "-ta=nvidia" flag? These typically occur when the code uses more then 2GB of static data (common blocks or static arrays) and need to use the Medium Memory Model (-mcmodel=medium). If this is the case, then you will need to reduce the size of your static variables since GPU can't yet be used with the Medium Memory Model.

If the code compiles without "-mcmodel=medium", please send the full source to PGI Customer Service (trs@pgroup.com) since we''ll to investigate this further.

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group