PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

understanding problems with acc directives.
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Alastairadb10



Joined: 23 Jan 2010
Posts: 6
Location: UK

PostPosted: Thu Apr 29, 2010 3:25 am    Post subject: understanding problems with acc directives. Reply with quote

To anyone that can help.

Im trying to use acc directives on an old serial fortran code. These have worked well on the simplest of loops and can see the GPU load increasing. Much of my code however is not 'simple' loops and the build diagnostics/Fort Language info is beyond my understanding.
1. Do you have documentation that would explain it?
2. Could I send code for an oppinion?
below is the top portion of the code im trying to port to GPU. For info the OMP is not being used but have left in as this is what am trying to improve upon.
top of code:
Code:
C$OMP PARALLEL DO
C$OMP& SHARED(NLAY,NROW,NCOL,NRC,IBOUND,CC,CR,NODES,CV,HCOF,RHS,HNEW,
C$OMP& NORM)
C$OMP& PRIVATE(I,J,K,N,E,NRN,NRL,NCN,NCL,B,H,D,F,Z,S,RRHS,HHCOF,BHNEW,
C$OMP& HHNEW,DHNEW,FHNEW,ZHNEW,SHNEW,NCF,NCD,NRB,NRH,NLS,NLZ,NLN,NLL)

!$acc region copy(NLAY,NROW,NCOL,NRC,IBOUND,CC,CR,NODES,CV,
!$acc1 HCOF,RHS,HNEW,NORM,I,J,K,N,E,NRN,NRL,NCN,NCL,B,H,D,F,Z,
!$acc2 S,RRHS,HHCOF,BHNEW,HHNEW,DHNEW,FHNEW,ZHNEW,SHNEW,NCF,
!$acc3 NCD,NRB,NRH,NLS,NLZ,NLN,NLL)
      DO 115 K=1,NLAY
      DO 115 I=1,NROW
      DO 115 J=1,NCOL
C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPT OF CURRENT CELL AND
C-------SKIP CALCULATIONS IF CELL IS INACTIVE
      N=J+(I-1)*NCOL+(K-1)*NRC
      IF(IBOUND(N).EQ.0) THEN
        CC(N)=0.
        CR(N)=0.


First few lines of relevant diagnostics are:
pcg2ap:
315, No parallel kernels found, accelerator region ignored
319, Complex loop carried dependence of 'ibound' prevents parallelization
Complex loop carried dependence of 'cc' prevents parallelization
Complex loop carried dependence of 'cr' prevents parallelization
Complex loop carried dependence of 'cv' prevents parallelization
Complex loop carried dependence of 'hcof' prevents parallelization
Complex loop carried dependence of 'rhs' prevents parallelization
Scalar last value needed after loop for 'hhcof' at line 640
Loop carried scalar dependence for 'sn' at line 433
Loop carried scalar dependence for 'sn' at line 435


THANKS
Back to top
View user's profile
Alastairadb10



Joined: 23 Jan 2010
Posts: 6
Location: UK

PostPosted: Thu Apr 29, 2010 7:49 am    Post subject: Reply with quote

Have been scratching head all day and not much further but think that array summations ie a()=a()+x are generally giving the "Complex loop carried dependence of" as a() would have a different value for every thread...Does this sound right? If so can I get round this? summation is very common in my program. Ive commented out these lines temp and the msg is no longer generated.

Attached is a (complete) representative acc region from one of the more computational routines. Same problems as last post stil apply.
Im new to this so ANY help much appreciated. If can crack this region then will be able to apply to other areas of my code. thanks.

Diagnostics are:
pcg2ap:
315, No parallel kernels found, accelerator region ignored
321, Complex loop carried dependence of 'ibound' prevents parallelization
Complex loop carried dependence of 'cc' prevents parallelization
Complex loop carried dependence of 'cr' prevents parallelization
Complex loop carried dependence of 'cv' prevents parallelization
Complex loop carried dependence of 'hcof' prevents parallelization
Complex loop carried dependence of 'rhs' prevents parallelization
Scalar last value needed after loop for 'hhcof' at line 642
Accelerator restriction: scalar variable live-out from loop: sr
Accelerator restriction: scalar variable live-out from loop: sp
Accelerator restriction: scalar variable live-out from loop: sn
Accelerator restriction: scalar variable live-out from loop: hhcof
322, Complex loop carried dependence of 'ibound' prevents parallelization
Loop carried dependence due to exposed use of 'ibound(:)' prevents parallelization
Complex loop carried dependence of 'cc' prevents parallelization
Loop carried dependence due to exposed use of 'cc(nodes)' prevents parallelization
Complex loop carried dependence of 'cr' prevents parallelization
Loop carried dependence due to exposed use of 'cr(nodes)' prevents parallelization
Complex loop carried dependence of 'cv' prevents parallelization
Loop carried dependence due to exposed use of 'cv(nodes)' prevents parallelization
Complex loop carried dependence of 'hcof' prevents parallelization
Loop carried dependence due to exposed use of 'hcof(nodes)' prevents parallelization
Complex loop carried dependence of 'rhs' prevents parallelization
Loop carried dependence due to exposed use of 'rhs(nodes)' prevents parallelization
Scalar last value needed after loop for 'hhcof' at line 642
Accelerator restriction: scalar variable live-out from loop: sr
Accelerator restriction: scalar variable live-out from loop: sp
Accelerator restriction: scalar variable live-out from loop: sn
Accelerator restriction: scalar variable live-out from loop: hhcof
323, Complex loop carried dependence of 'ibound' prevents parallelization
Complex loop carried dependence of 'cc' prevents parallelization
Loop carried dependence due to exposed use of 'cc(nodes)' prevents parallelization
Complex loop carried dependence of 'cr' prevents parallelization
Loop carried dependence due to exposed use of 'cr(nodes)' prevents parallelization
Complex loop carried dependence of 'cv' prevents parallelization
Loop carried dependence due to exposed use of 'cv(nodes)' prevents parallelization
Complex loop carried dependence of 'hcof' prevents parallelization
Loop carried dependence due to exposed use of 'hcof(nodes)' prevents parallelization
Complex loop carried dependence of 'rhs' prevents parallelization
Loop carried dependence due to exposed use of 'rhs(nodes)' prevents parallelization
Loop carried dependence due to exposed use of 'ibound(:)' prevents parallelization
Scalar last value needed after loop for 'hhcof' at line 642
Accelerator restriction: scalar variable live-out from loop: sr
Accelerator restriction: scalar variable live-out from loop: sp
Accelerator restriction: scalar variable live-out from loop: sn
Accelerator restriction: scalar variable live-out from loop: hhcof



Code:
C$OMP PARALLEL DO
C$OMP& SHARED(NLAY,NROW,NCOL,NRC,IBOUND,CC,CR,NODES,CV,HCOF,RHS,HNEW,
C$OMP& NORM)
C$OMP& PRIVATE(I,J,K,N,E,NRN,NRL,NCN,NCL,B,H,D,F,Z,S,RRHS,HHCOF,BHNEW,
C$OMP& HHNEW,DHNEW,FHNEW,ZHNEW,SHNEW,NCF,NCD,NRB,NRH,NLS,NLZ,NLN,NLL)

!$acc region copy(NLAY,NROW,NCOL,NRC,IBOUND(NODES),CC(NODES),
!$acc1 CR(NODES),NODES,CV(NODES),HCOF(NODES),RHS(NODES),
!$acc2 HNEW(NODES),NORM,I,J,K,N,E,NRN,NRL,NCN,NCL,B,H,D,F,Z,
!$acc3 S,RRHS,HHCOF,BHNEW,HHNEW,DHNEW,FHNEW,ZHNEW,SHNEW,NCF,
!$acc4 NCD,NRB,NRH,NLS,NLZ,NLN,NLL)

      DO 115 K=1,NLAY
      DO 115 I=1,NROW
      DO 115 J=1,NCOL
C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPT OF CURRENT CELL AND
C-------SKIP CALCULATIONS IF CELL IS INACTIVE
      N=J+(I-1)*NCOL+(K-1)*NRC
      IF(IBOUND(N).EQ.0) THEN
        CC(N)=0.
        CR(N)=0.
        IF(N.LE.(NODES-NRC)) CV(N)=0.
        IF(N.GE.2) CR(N-1)=0.
        IF(N.GE.NCOL+1) CC(N-NCOL)=0.
        IF(N.LE.(NODES-NRC).AND.N.GE.NRC+1) CV(N-NRC)=0.
        HCOF(N)=0.
        RHS(N)=0.
        GO TO 115
      ENDIF

C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPTS FOR LOCATING THE 6
C-------SURROUNDING CELLS
      NRN=N+NCOL
      NRL=N-NCOL
      NCN=N+1
      NCL=N-1
      NLN=N+NRC
      NLL=N-NRC
C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPTS FOR CONDUCTANCE TO THE 6
C-------SURROUNDING CELLS.
      NCF=N
      NCD=N-1
      NRB=N-NCOL
      NRH=N
      NLS=N
      NLZ=N-NRC
C
C-----GET CONDUCTANCES TO NEIGHBORING CELLS
C-------NEIGHBOR IS 1 ROW BACK
      B=DZERO
      BHNEW=DZERO
      IF(I.NE.1) THEN
        B=CC(NRB)
        BHNEW=B*(HNEW(NRL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 ROW AHEAD
      H=DZERO
      HHNEW=DZERO
      IF(I.NE.NROW) THEN
        H=CC(NRH)
        HHNEW=H*(HNEW(NRN)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 COLUMN BACK
      D=DZERO
      DHNEW=DZERO
      IF(J.NE.1) THEN
        D=CR(NCD)
        DHNEW=D*(HNEW(NCL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 COLUMN AHEAD
      F=DZERO
      FHNEW=DZERO
      IF(J.NE.NCOL) THEN
        F=CR(NCF)
        FHNEW=F*(HNEW(NCN)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 LAYER BEHIND
      Z=DZERO
      ZHNEW=DZERO
      IF(K.NE.1) THEN
        Z=CV(NLZ)
        ZHNEW=Z*(HNEW(NLL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 LAYER AHEAD
      S=DZERO
      SHNEW=DZERO
      IF(K.NE.NLAY) THEN
        S=CV(NLS)
        SHNEW=S*(HNEW(NLN)-HNEW(N))
      ENDIF
C
      IF(I.EQ.NROW) CC(N)=0.
      IF(J.EQ.NCOL) CR(N)=0.
C-------15JUN1993 SKIP CALCULATIONS AND MAKE CELL INACTIVE IF ALL
C                 SURROUNDING CELLS ARE INACTIVE
      IF(B+H+D+F+Z+S.EQ.0.) THEN
        IBOUND(N)=0
        HCOF(N)=0.
        RHS(N)=0.
        GO TO 115
      ENDIF
C
C-------CALCULATE THE RESIDUAL AND STORE IT IN RHS.  TO SCALE A,
C-------CALCULATE THE DIAGONAL OF THE A MATRIX, AND STORE IT IN HCOF.
      E=-Z-B-D-F-H-S
      RRHS=RHS(N)
      HHCOF=HNEW(N)*HCOF(N)
      RHS(N)=RRHS-ZHNEW-BHNEW-DHNEW-HHCOF-FHNEW-HHNEW-SHNEW
      IF(NORM.EQ.1) HCOF(N)=HCOF(N)+E
      IF(IBOUND(N).LT.0) RHS(N)=0.
C-------ADDED FOR SENSITIVITY CALCULATIONS 9/1/91
      IF(IU.NE.0.AND.IP.GT.0) THEN
        IF(I.EQ.1.AND.J.EQ.1.AND.K.EQ.1) THEN
          SN=0.
          SP=0.
          SR=0.
        ENDIF
        !SR=SR+RHS(N)
        !IF(RRHS.LT.0.) SN=SN+RRHS
        !IF(RRHS.GT.0.) SP=SP+RRHS
        !IF(-ZHNEW.LT.0.) SN=SN-ZHNEW
        !IF(-ZHNEW.GT.0.) SP=SP-ZHNEW
        !IF(-BHNEW.LT.0.) SN=SN-BHNEW
        !IF(-BHNEW.GT.0.) SP=SP-BHNEW
        !IF(-DHNEW.LT.0.) SN=SN-DHNEW
        !IF(-DHNEW.GT.0.) SP=SP-DHNEW
        !IF(-HHCOF.LT.0.) SN=SN-HHCOF
        !IF(-HHCOF.GT.0.) SP=SP-HHCOF
        !IF(-FHNEW.LT.0.) SN=SN-FHNEW
        !IF(-FHNEW.GT.0.) SP=SP-FHNEW
        !IF(-HHNEW.LT.0.) SN=SN-HHNEW
        !IF(-HHNEW.GT.0.) SP=SP-HHNEW
        !IF(-SHNEW.LT.0.) SN=SN-SHNEW
        !IF(-SHNEW.GT.0.) SP=SP-SHNEW
      ENDIF
  115 CONTINUE
!$acc end region
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Apr 29, 2010 11:18 am    Post subject: Reply with quote

Hi Alastairadb10,

Indeed the problem is due to the use of the calculated index "N". In a typical use case, the calculated index does have the potential of overlapping (or at least the compiler is unable to tell if all values are independent). The compiler must be conservative and assume the calculated index values are not unique and can't parallelize the code. Otherwise, the results would be non-deterministic depending upon the order the threads were run.

However, in this case, I believe that the compiler does have enough information to determine that all values of N are unique. I have asked our engineers to investigate if the compiler can add this analysis since linearization is fairly common. Until then, there are a couple of potential work arounds.

    1. Make CC, CR, HCOF, RHS, etc. into three dimensional arrays and remove the use of N.
    2. Collapse the K, I, and J loops into a single 'N' loop.
    3. Force parallelization by adding the "!$acc do parallel,vector" directive just before the K loop.

Note that we are adding an "independent" clause in which allow the user to assert that all loops are independent and the compiler can disable the dependency analysis. Once available in a future version of the compiler, you can try this clause as well.

Other issues:
Quote:
Accelerator restriction: scalar variable live-out from loop: sr
Accelerator restriction: scalar variable live-out from loop: sp
Accelerator restriction: scalar variable live-out from loop: sn
Accelerator restriction: scalar variable live-out from loop: hhcof
Most likely you're using the values of these scalar variables outside the parallel region.

For "HHCOF", all the threads will assign it a value. Once the parallel region ends, which of the thread's value for HHCOF should be used? The compiler could pick one, but it could lead to non-deterministic results. To fix, either don't use HHCOF on the right hand side of an expression after the parallel loop or privatize it using the "!$acc do private(HHCOF)" clause (add just before the K loop). Privatizing a variable will create a unique temporary variable for each thread and the host's copy of the variable will be unchanged.

For the "SR", "SP", and "SN" variables, the problem is the same as HHCOF. However, once you uncomment out the reduction code (reductions are supported within accelerator regions) instead of privatizing these values, you'll want to move their initialization before the accelerator region. As it is now, your code assumes that index 1 for I, J, and K will get executed first, thus creating a loop dependency.

Next, I would remove all of your "copy" clauses, and just use "!$acc region". The copy clause is only needed when you want to override the compiler's default (See the output of "-Minfo=accel" to see how the compiler is coping in the data). Scalars are copied in by default and are not be copied out due to the "live-out" problem. For your arrays, "CC(NODES)" tells the compiler to copy a single element of CC. To copy the entire array, either just use the array name, "CC", use a colon, "CC(:)" or use the full extent to copy, "CC(1:NODES)".



Hope this helps,
Mat
Back to top
View user's profile
Alastairadb10



Joined: 23 Jan 2010
Posts: 6
Location: UK

PostPosted: Fri Apr 30, 2010 9:08 am    Post subject: Reply with quote

Mat,

This has helped greatly, thanks

Following your advice ive made the below changes and after a few tweeks am now not getting any reported reasons why the code cannot be parallelised. I will be able to make comparible changes to other intensive parts of the code. I am however getting compile errors which Ive not seen before. Would you be able to shed light again?

the errors are:
1. State space incorrect for instruction 'st', File: Ptxas C
2. Internal compiler error. pgnvd job exited with nonzero status code 0, File: PPCG-VKDSingle.for (the one im editing), Line 1016
For info the Line 1016 is below the 'acc end region' and is the last line of the current subroutine (END).

My current thinking is that its to do with HCOF(N) possibly a live out issue?
Ill keep head scratching.

The code changes following changes is (PG006 on left where removed code, PG006 on right where added):
Code:
     SN=0.                                !PG006
      SP=0.                                !PG006 
      SR=0.                                !PG006
!$acc region copy(CR(:),CV(:),CC(:),IBOUND(:),HCOF(:),RHS(:),SR,
!$acc1 SPa(:),SNa(:))

!$acc do independent,private(HHCOF,ZHNEW,BHNEW,DHNEW,FHNEW,HHNEW,SHNEW,
!$acc1 I,J,K)

!PG006DO 115 K=1,NLAY
!PG006DO 115 I=1,NROW
!PG006DO 115 J=1,NCOL
      DO 115 N=1,NCOL*NROW*NLAY
C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPT OF CURRENT CELL AND
C-------SKIP CALCULATIONS IF CELL IS INACTIVE
!PG006N=J+(I-1)*NCOL+(K-1)*NRC
     
      I=INT((N-(INT((N/NRC)))*NRC)/NCOL)+1
      J=(((N-(INT((N/NRC)))*NRC)/NCOL)-INT((N-(INT((N/NRC)))*NRC)/NCOL))*NCOL
      K=INT((N/NRC))+1
     
      IF(IBOUND(N).EQ.0) THEN
        CC(N)=0.
        CR(N)=0.
        IF(N.LE.(NODES-NRC)) CV(N)=0.
        IF(N.GE.2) CR(N-1)=0.
        IF(N.GE.NCOL+1) CC(N-NCOL)=0.
        IF(N.LE.(NODES-NRC).AND.N.GE.NRC+1) CV(N-NRC)=0.
        HCOF(N)=0.
        RHS(N)=0.
        GO TO 115
      ENDIF

C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPTS FOR LOCATING THE 6
C-------SURROUNDING CELLS
      NRN=N+NCOL
      NRL=N-NCOL
      NCN=N+1
      NCL=N-1
      NLN=N+NRC
      NLL=N-NRC
C
C-------CALCULATE 1 DIMENSIONAL SUBSCRIPTS FOR CONDUCTANCE TO THE 6
C-------SURROUNDING CELLS.
      NCF=N
      NCD=N-1
      NRB=N-NCOL
      NRH=N
      NLS=N
      NLZ=N-NRC
C
C-----GET CONDUCTANCES TO NEIGHBORING CELLS
C-------NEIGHBOR IS 1 ROW BACK
      B=DZERO
      BHNEW=DZERO
     IF(I.NE.1) THEN
        B=CC(NRB)
        BHNEW=B*(HNEW(NRL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 ROW AHEAD
      H=DZERO
      HHNEW=DZERO
      IF(I.NE.NROW) THEN 
        H=CC(NRH)
        HHNEW=H*(HNEW(NRN)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 COLUMN BACK
      D=DZERO
      DHNEW=DZERO
      IF(J.NE.1) THEN 
        D=CR(NCD)
        DHNEW=D*(HNEW(NCL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 COLUMN AHEAD
      F=DZERO
      FHNEW=DZERO 
      IF(J.NE.NCOL) THEN 
        F=CR(NCF)
        FHNEW=F*(HNEW(NCN)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 LAYER BEHIND
      Z=DZERO
      ZHNEW=DZERO
      IF(K.NE.1) THEN 
        Z=CV(NLZ)
        ZHNEW=Z*(HNEW(NLL)-HNEW(N))
      ENDIF
C
C-------NEIGHBOR IS 1 LAYER AHEAD
      S=DZERO
      SHNEW=DZERO
      IF(K.NE.NLAY) THEN     
        S=CV(NLS)
        SHNEW=S*(HNEW(NLN)-HNEW(N))
      ENDIF
C
      IF(I.EQ.NROW) CC(N)=0.
      IF(J.EQ.NCOL) CR(N)=0.

C-------15JUN1993 SKIP CALCULATIONS AND MAKE CELL INACTIVE IF ALL
C                 SURROUNDING CELLS ARE INACTIVE
      IF(B+H+D+F+Z+S.EQ.0.) THEN
        IBOUND(N)=0
        HCOF(N)=0.
        RHS(N)=0.
        GO TO 115
      ENDIF
C
C-------CALCULATE THE RESIDUAL AND STORE IT IN RHS.  TO SCALE A,
C-------CALCULATE THE DIAGONAL OF THE A MATRIX, AND STORE IT IN HCOF.
      E=-Z-B-D-F-H-S
      RRHS=RHS(N)
      HHCOF=HNEW(N)*HCOF(N)
      RHS(N)=RRHS-ZHNEW-BHNEW-DHNEW-HHCOF-FHNEW-HHNEW-SHNEW
      IF(NORM.EQ.1) HCOF(N)=HCOF(N)+E
      IF(IBOUND(N).LT.0) RHS(N)=0.
C-------ADDED FOR SENSITIVITY CALCULATIONS 9/1/91
      IF(IU.NE.0.AND.IP.GT.0) THEN
!PG006  IF(I.EQ.1.AND.J.EQ.1.AND.K.EQ.1) THEN !moved to above region
!PG006      SN=0.                               
!PG006      SP=0.                             
!PG006      SR=0.                             
!PG006  ENDIF                               
        SR=SR+RHS(N)
!PG006        IF(RRHS.LT.0.) SN=SN+RRHS!ln447
!PG006        IF(RRHS.GT.0.) SP=SP+RRHS
!PG006        IF(-ZHNEW.LT.0.) SN=SN-ZHNEW
!PG006        IF(-ZHNEW.GT.0.) SP=SP-ZHNEW
!PG006        IF(-BHNEW.LT.0.) SN=SN-BHNEW
!PG006        IF(-BHNEW.GT.0.) SP=SP-BHNEW
!PG006        IF(-DHNEW.LT.0.) SN=SN-DHNEW
!PG006        IF(-DHNEW.GT.0.) SP=SP-DHNEW
!PG006        IF(-HHCOF.LT.0.) SN=SN-HHCOF
!PG006        IF(-HHCOF.GT.0.) SP=SP-HHCOF
!PG006        IF(-FHNEW.LT.0.) SN=SN-FHNEW
!PG006        IF(-FHNEW.GT.0.) SP=SP-FHNEW
!PG006        IF(-HHNEW.LT.0.) SN=SN-HHNEW
!PG006        IF(-HHNEW.GT.0.) SP=SP-HHNEW
!PG006        IF(-SHNEW.LT.0.) SN=SN-SHNEW
!PG006        IF(-SHNEW.GT.0.) SP=SP-SHNEW
        IF(RRHS.LT.0.) SNa(N)=SN+RRHS!ln447
        IF(RRHS.GT.0.) SPa(N)=SP+RRHS
        IF(-ZHNEW.LT.0.) SNa(N)=SN-ZHNEW
        IF(-ZHNEW.GT.0.) SPa(N)=SP-ZHNEW
        IF(-BHNEW.LT.0.) SNa(N)=SN-BHNEW
        IF(-BHNEW.GT.0.) SPa(N)=SP-BHNEW
        IF(-DHNEW.LT.0.) SNa(N)=SN-DHNEW
        IF(-DHNEW.GT.0.) SPa(N)=SP-DHNEW
        IF(-HHCOF.LT.0.) SNa(N)=SN-HHCOF
        IF(-HHCOF.GT.0.) SPa(N)=SP-HHCOF
        IF(-FHNEW.LT.0.) SNa(N)=SN-FHNEW
        IF(-FHNEW.GT.0.) SPa(N)=SP-FHNEW
        IF(-HHNEW.LT.0.) SNa(N)=SN-HHNEW
        IF(-HHNEW.GT.0.) SPa(N)=SP-HHNEW
        IF(-SHNEW.LT.0.) SNa(N)=SN-SHNEW
        IF(-SHNEW.GT.0.) SPa(N)=SP-SHNEW
      ENDIF
  115 CONTINUE
!$acc end region
     SN=0.0                           !PG006
     SP=0.0                           !PG006
     DO 116 N=1,NCOL*NROW*NLAY        !PG006
      SN=SN+SNa(N)                   !PG006
      SP=SP+SPa(N)                   !PG006
  116 CONTINUE                         !PG006


Thanks
Al
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Fri Apr 30, 2010 12:22 pm    Post subject: Reply with quote

Hi Al,

Internal compiler errors are always problems with the compilers. In this case, the compiler is doing something wrong with a ptxas (the GPU assembler) instruction.

However, when I tried your code, it compiled fine. Can you please send the full source to PGI Customer Support (trs@pgroup.com) and ask them to forward it to me? I'll have a better understanding of the problem once I can recreate the error.

Thanks,
Mat

Code:
% pgf90 -c hh.f90 -Mfixed -ta=nvidia -Minfo=accel -V10.4
MAIN:
     14, Generating copy(sna(:))
         Generating copy(spa(:))
         Generating copy(cc(:))
         Generating copy(cv(:))
         Generating copy(cr(:))
         Generating copyin(hnew(:))
         Generating copy(rhs(:))
         Generating copy(hcof(:))
         Generating copy(ibound(:))
         Generating compute capability 1.0 kernel
         Generating compute capability 1.3 kernel
     23, Loop is parallelizable
         Accelerator kernel generated
         23, !$acc do parallel, vector(32)
             Using register for 'ibound'
             Non-stride-1 accesses for array 'cr'
             Non-stride-1 accesses for array 'cc'
             Using register for 'hcof'
             Using register for 'rhs'
             Using register for 'hnew'
             Using register for 'sna'
             Using register for 'spa'
        140, Sum reduction generated for sr
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group