PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

How to compile existing C/C++ project w/ NVIDIA GPU?
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
vacaloca



Joined: 26 Jul 2012
Posts: 5

PostPosted: Thu Aug 02, 2012 8:59 pm    Post subject: Reply with quote

mkcolg wrote:
FYI, we have whole chapter in our user's devoted to Inter-language calling, including C to C++ and C++ to C that you might find useful. (See Chapter 13 of http://www.pgroup.com/doc/pgiug.pdf)

ehehe... I should RTFM more often. :)

Anyway, I managed to port the code to C that compiles under pgcc and just stuck it all in one file just to make it easier for myself in the meantime. The issue is that when I tried the pragmas suggested I did not get the right outputs -- in fact I ended up getting no output at all, results were still initialized to zero, or were orders of magnitude off. When the pragmas are not in place, I get the exact same output I get with compiling with MSVC 2008/2010, which is good, because I know the code is working.

I do want to mention however, that even without pragmas, the code executes in ~17 seconds for the same data set with a single CPU thread, vs 58 seconds for the MSVC version, but regardless of that, OpenMP runs it in ~9 secs.

I have sent the ported code to the support e-mail in case any suggestions can be made in regards to why the pragma additions are not producing the correct outputs. Perhaps there were some implied changes mentioned in the post that I did not implement?

I compiled as:
Code:
pgcc -Minline -ta=nvidia,cc20 -acc -Minfo file.c


Here is the output I got when I did the pragmas as suggested with -Minfo flag and I get the bogus results:
Code:
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Mismatched loop levels when adding syn
cs (ifmm.c: 466)
main:
    368, time inlined, size=2, file ifmm.c (132)
    462, Generating present_or_copy(aEph[0:Nobs])
         Generating present_or_copy(aEth[0:Nobs])
         Generating present_or_copy(aCobsZ[0:NgO])
         Generating present_or_copy(aCobsY[0:NgO])
         Generating present_or_copy(aCobsX[0:NgO])
         Generating present_or_copy(acs32[0:Nobs])
         Generating present_or_copy(acs23[0:Nobs])
         Generating present_or_copy(acs22[0:Nobs])
         Generating present_or_copy(acs13[0:Nobs])
         Generating present_or_copy(acs12[0:Nobs])
         Generating present_or_copy(akuZ[0:NgONgS])
         Generating present_or_copy(akuY[0:NgONgS])
         Generating present_or_copy(akuX[0:NgONgS])
         Generating present_or_copy(arz[0:Nobs])
         Generating present_or_copy(ary[0:Nobs])
         Generating present_or_copy(arx[0:Nobs])
         Generating present_or_copy(aRadio[0:NgONgS])
         Generating present_or_copy(aAggregation_Jz[0:NgONgS])
         Generating present_or_copy(aAggregation_Jy[0:NgONgS])
         Generating present_or_copy(aAggregation_Jx[0:NgONgS])
         Generating present_or_copy(negObs[0:NgO])
         Generating present_or_copy(aNgS)
         Generating present_or_copy(aNgO)
         Generating present_or_copy(afactorJ)
         Generating present_or_copy(k0)
         Generating present_or_copy(coefEcartEphZ)
         Generating present_or_copy(coefEcartEthZ)
         Generating present_or_copy(coefEcartEphY)
         Generating present_or_copy(coefEcartEthY)
         Generating present_or_copy(coefEcartEphX)
         Generating present_or_copy(prodEscal)
         Generating present_or_copy(vvar4)
         Generating present_or_copy(vvar3)
         Generating present_or_copy(vvar2)
         Generating present_or_copy(vvar1)
         Generating present_or_copy(aux6)
         Generating present_or_copy(aux5)
         Generating present_or_copy(aux4)
         Generating present_or_copy(aux3)
         Generating present_or_copy(aux2)
         Generating present_or_copy(aux1)
         Generating present_or_copy(ap)
         Generating present_or_copy(an)
         Generating present_or_copy(am)
         Generating present_or_copy(cont)
         Generating present_or_copy(accumZ)
         Generating present_or_copy(accumY)
         Generating present_or_copy(accumX)
    466, Accelerator kernel generated
        468, #pragma acc loop gang /* blockIdx.x */
        476, #pragma acc loop vector(256) /* threadIdx.x */
        481, #pragma acc loop vector(256) /* threadIdx.x */
    468, Scalar last value needed after loop for 'prodEscal' at line 619
         Scalar last value needed after loop for 'prodEscal' at line 620
         Scalar last value needed after loop for 'prodEscal' at line 621
         Scalar last value needed after loop for 'aux3' at line 625
         Scalar last value needed after loop for 'aux3' at line 631
         Scalar last value needed after loop for 'aux3' at line 681
         Scalar last value needed after loop for 'aux6' at line 625
         Scalar last value needed after loop for 'aux6' at line 577
         Scalar last value needed after loop for 'aux6' at line 578
         Scalar last value needed after loop for 'aux2' at line 624
         Scalar last value needed after loop for 'aux2' at line 630
         Scalar last value needed after loop for 'aux2' at line 681
         Scalar last value needed after loop for 'aux5' at line 624
         Scalar last value needed after loop for 'aux5' at line 574
         Scalar last value needed after loop for 'aux5' at line 575
         Scalar last value needed after loop for 'aux1' at line 623
         Scalar last value needed after loop for 'aux1' at line 629
         Scalar last value needed after loop for 'aux1' at line 567
         Scalar last value needed after loop for 'aux1' at line 568
         Scalar last value needed after loop for 'aux1' at line 569
         Scalar last value needed after loop for 'aux1' at line 681
         Scalar last value needed after loop for 'aux4' at line 623
         Scalar last value needed after loop for 'aux4' at line 571
         Scalar last value needed after loop for 'aux4' at line 572
         Accelerator restriction: scalar variable live-out from loop: accumX
         Accelerator restriction: scalar variable live-out from loop: aux4
         Accelerator restriction: scalar variable live-out from loop: aux1
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
         Accelerator restriction: scalar variable live-out from loop: accumY
         Accelerator restriction: scalar variable live-out from loop: aux5
         Accelerator restriction: scalar variable live-out from loop: aux2
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
         Accelerator restriction: scalar variable live-out from loop: accumZ
         Accelerator restriction: scalar variable live-out from loop: aux6
         Accelerator restriction: scalar variable live-out from loop: aux3
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
         Accelerator restriction: scalar variable live-out from loop: prodEscal
         Accelerator restriction: scalar variable live-out from loop: vvar4
         Accelerator restriction: scalar variable live-out from loop: vvar3
         Accelerator restriction: scalar variable live-out from loop: vvar2
         Accelerator restriction: scalar variable live-out from loop: vvar1
         Accelerator restriction: scalar variable live-out from loop: am
         Accelerator restriction: scalar variable live-out from loop: cont
         Accelerator restriction: scalar variable live-out from loop: an
         Conditional loop will be executed in scalar mode
    476, Accelerator restriction: induction variable live-out from loop: ap
    477, Accelerator restriction: induction variable live-out from loop: am
         Accelerator restriction: induction variable live-out from loop: ap
    481, Complex loop carried dependence of '*(aAggregation_Jz).real' prevents parallelization
         Loop carried dependence of '*(aAggregation_Jz).real' prevents parallelization
         Complex loop carried dependence of '*(aAggregation_Jz).imag' prevents parallelization
         Complex loop carried dependence of '*(aAggregation_Jy).real' prevents parallelization
         Loop carried dependence of '*(aAggregation_Jy).real' prevents parallelization
         Complex loop carried dependence of '*(aAggregation_Jy).imag' prevents parallelization
         Complex loop carried dependence of '*(aAggregation_Jx).real' prevents parallelization
         Loop carried dependence of '*(aAggregation_Jx).real' prevents parallelization
         Complex loop carried dependence of '*(aAggregation_Jx).imag' prevents parallelization
         Complex loop carried dependence of '*(arx)' prevents parallelization
         Complex loop carried dependence of '*(aCobsX)' prevents parallelization
         Complex loop carried dependence of '*(ary)' prevents parallelization
         Complex loop carried dependence of '*(aCobsY)' prevents parallelization
         Complex loop carried dependence of '*(arz)' prevents parallelization
         Complex loop carried dependence of '*(aCobsZ)' prevents parallelization
         Complex loop carried dependence of '*(akuZ)' prevents parallelization
         Complex loop carried dependence of '*(akuY)' prevents parallelization
         Complex loop carried dependence of '*(akuX)' prevents parallelization
         Scalar last value needed after loop for 'prodEscal' at line 619
         Scalar last value needed after loop for 'prodEscal' at line 620
         Scalar last value needed after loop for 'prodEscal' at line 621
         Complex loop carried dependence of '*(acs32)' prevents parallelization
         Complex loop carried dependence of '*(acs22)' prevents parallelization
         Complex loop carried dependence of '*(acs12)' prevents parallelization
         Complex loop carried dependence of '*(acs23)' prevents parallelization
         Complex loop carried dependence of '*(acs13)' prevents parallelization
         Complex loop carried dependence of '*(aEph).real' prevents parallelization
         Complex loop carried dependence of '*(aEth).real' prevents parallelization
         Scalar last value needed after loop for 'aux3' at line 625
         Scalar last value needed after loop for 'aux3' at line 631
         Scalar last value needed after loop for 'aux3' at line 681
         Complex loop carried dependence of '*(aEph).imag' prevents parallelization
         Complex loop carried dependence of '*(aEth).imag' prevents parallelization
         Scalar last value needed after loop for 'aux6' at line 625
         Scalar last value needed after loop for 'aux6' at line 577
         Scalar last value needed after loop for 'aux6' at line 578
         Scalar last value needed after loop for 'aux2' at line 624
         Scalar last value needed after loop for 'aux2' at line 630
         Scalar last value needed after loop for 'aux2' at line 681
         Scalar last value needed after loop for 'aux5' at line 624
         Scalar last value needed after loop for 'aux5' at line 574
         Scalar last value needed after loop for 'aux5' at line 575
         Scalar last value needed after loop for 'aux1' at line 623
         Scalar last value needed after loop for 'aux1' at line 629
         Scalar last value needed after loop for 'aux1' at line 567
         Scalar last value needed after loop for 'aux1' at line 568
         Scalar last value needed after loop for 'aux1' at line 569
         Scalar last value needed after loop for 'aux1' at line 681
         Scalar last value needed after loop for 'aux4' at line 623
         Scalar last value needed after loop for 'aux4' at line 571
         Scalar last value needed after loop for 'aux4' at line 572
         Accelerator restriction: scalar variable live-out from loop: accumX
         Accelerator restriction: scalar variable live-out from loop: aux4
         Accelerator restriction: scalar variable live-out from loop: aux1
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
         Accelerator restriction: scalar variable live-out from loop: accumY
         Accelerator restriction: scalar variable live-out from loop: aux5
         Accelerator restriction: scalar variable live-out from loop: aux2
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
         Accelerator restriction: scalar variable live-out from loop: accumZ
         Accelerator restriction: scalar variable live-out from loop: aux6
         Accelerator restriction: scalar variable live-out from loop: aux3
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
         Accelerator restriction: scalar variable live-out from loop: prodEscal
         Accelerator restriction: scalar variable live-out from loop: vvar4
         Accelerator restriction: scalar variable live-out from loop: vvar3
         Accelerator restriction: scalar variable live-out from loop: vvar2
         Accelerator restriction: scalar variable live-out from loop: vvar1
         Accelerator restriction: scalar variable live-out from loop: am
    488, Accelerator restriction: induction variable live-out from loop: ap
         Scalar last value needed after loop for 'prodEscal' at line 619
         Scalar last value needed after loop for 'prodEscal' at line 620
         Scalar last value needed after loop for 'prodEscal' at line 621
         Scalar last value needed after loop for 'aux3' at line 625
         Scalar last value needed after loop for 'aux3' at line 631
         Scalar last value needed after loop for 'aux3' at line 681
         Scalar last value needed after loop for 'aux6' at line 625
         Scalar last value needed after loop for 'aux6' at line 577
         Scalar last value needed after loop for 'aux6' at line 578
         Scalar last value needed after loop for 'accumZ' at line 534
         Scalar last value needed after loop for 'accumZ' at line 535
         Scalar last value needed after loop for 'aux2' at line 624
         Scalar last value needed after loop for 'aux2' at line 630
         Scalar last value needed after loop for 'aux2' at line 681
         Scalar last value needed after loop for 'aux5' at line 624
         Scalar last value needed after loop for 'aux5' at line 574
         Scalar last value needed after loop for 'aux5' at line 575
         Scalar last value needed after loop for 'accumY' at line 537
         Scalar last value needed after loop for 'accumY' at line 538
         Scalar last value needed after loop for 'aux1' at line 623
         Scalar last value needed after loop for 'aux1' at line 629
         Scalar last value needed after loop for 'aux1' at line 567
         Scalar last value needed after loop for 'aux1' at line 568
         Scalar last value needed after loop for 'aux1' at line 569
         Scalar last value needed after loop for 'aux1' at line 681
         Scalar last value needed after loop for 'aux4' at line 623
         Scalar last value needed after loop for 'aux4' at line 571
         Scalar last value needed after loop for 'aux4' at line 572
         Scalar last value needed after loop for 'accumX' at line 540
         Scalar last value needed after loop for 'accumX' at line 541
         Accelerator restriction: scalar variable live-out from loop: accumX
         Accelerator restriction: scalar variable live-out from loop: aux4
         Accelerator restriction: scalar variable live-out from loop: aux1
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
         Accelerator restriction: scalar variable live-out from loop: accumY
         Accelerator restriction: scalar variable live-out from loop: aux5
         Accelerator restriction: scalar variable live-out from loop: aux2
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
         Accelerator restriction: scalar variable live-out from loop: accumZ
         Accelerator restriction: scalar variable live-out from loop: aux6
         Accelerator restriction: scalar variable live-out from loop: aux3
         Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
         Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
         Accelerator restriction: scalar variable live-out from loop: prodEscal
         Accelerator restriction: scalar variable live-out from loop: vvar4
         Accelerator restriction: scalar variable live-out from loop: vvar3
         Accelerator restriction: scalar variable live-out from loop: vvar2
         Accelerator restriction: scalar variable live-out from loop: vvar1
    490, Accelerator restriction: induction variable live-out from loop: ap
         Accelerator restriction: induction variable live-out from loop: am
    491, Accelerator restriction: induction variable live-out from loop: ap
         Accelerator restriction: induction variable live-out from loop: am
    492, Accelerator restriction: induction variable live-out from loop: ap
         Accelerator restriction: induction variable live-out from loop: am
    494, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    500, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
         Accelerator restriction: induction variable live-out from loop: am
    502, Accelerator restriction: induction variable live-out from loop: am
         Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    504, Accelerator restriction: induction variable live-out from loop: am
    505, Accelerator restriction: induction variable live-out from loop: am
    506, c_prod inlined, size=6, file ifmm.c (694)
         511, Accelerator restriction: induction variable live-out from loop: an
              Accelerator restriction: induction variable live-out from loop: ap
              Accelerator restriction: induction variable live-out from loop: am
         513, Accelerator restriction: induction variable live-out from loop: an
              Accelerator restriction: induction variable live-out from loop: ap
              Accelerator restriction: induction variable live-out from loop: am
         515, Accelerator restriction: induction variable live-out from loop: am
         516, Accelerator restriction: induction variable live-out from loop: am
    517, c_prod inlined, size=6, file ifmm.c (694)
         522, Accelerator restriction: induction variable live-out from loop: an
              Accelerator restriction: induction variable live-out from loop: ap
              Accelerator restriction: induction variable live-out from loop: am
         524, Accelerator restriction: induction variable live-out from loop: an
              Accelerator restriction: induction variable live-out from loop: ap
              Accelerator restriction: induction variable live-out from loop: am
         526, Accelerator restriction: induction variable live-out from loop: am
         527, Accelerator restriction: induction variable live-out from loop: am
    528, c_prod inlined, size=6, file ifmm.c (694)
         532, Accelerator restriction: induction variable live-out from loop: am
              Accelerator restriction: induction variable live-out from loop: ap
    534, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    535, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    537, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    538, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    540, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    541, Accelerator restriction: induction variable live-out from loop: an
         Accelerator restriction: induction variable live-out from loop: ap
    543, Accelerator restriction: induction variable live-out from loop: an
    544, Accelerator restriction: induction variable live-out from loop: ap
    567, c_prod inlined, size=6, file ifmm.c (694)
    568, c_prod inlined, size=6, file ifmm.c (694)
    569, c_prod inlined, size=6, file ifmm.c (694)
    619, c_prod inlined, size=6, file ifmm.c (694)
    620, c_prod inlined, size=6, file ifmm.c (694)
    621, c_prod inlined, size=6, file ifmm.c (694)
    637, time inlined, size=2, file ifmm.c (132)
    653, difftime inlined, size=2, file ifmm.c (83)
    691, Accelerator restriction: induction variable live-out from loop: ap
PGC/x86-64 Windows 12.6-0: compilation completed with warnings
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6136
Location: The Portland Group Inc.

PostPosted: Fri Aug 03, 2012 2:03 pm    Post subject: Reply with quote

Hi Luis,

I'm looking at the code right now. The "scalar last value needed" is because the address of the these variables are being passed into the "c_prod" routine. The compiler must assume that the value could be stored in a variable needed after the end of the compute region. The work around is to use the "private" clause to force the compiler to use a private copy.

The "Complex loop carried dependence" are because the compiler can't tell if your array accesses are unique since you use a look-up table to get the index. The compiler must assume that duplicate elements are used, and hence the loop is not parallel. When using the kernels method, you can use the "loop independent" clause to have the compiler ignore this dependency by asserting that they are independent. Using the parallel method, "independent" is implied and these warnings don't effect kernel generation. However, if the loop up table does contain duplicate index, you may get wrong answers.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group