PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

NaNs
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
szczelba



Joined: 29 Jun 2010
Posts: 26

PostPosted: Wed Oct 27, 2010 2:05 am    Post subject: NaNs Reply with quote

Hello again,

I'm working on a quite complicated piece of code and trying to make it GPU-enabled. I've been already asking some questions about it. Right now I have a problem with a Not a Number results.
I have a loop that I want to compile and execute on the GPU:

Code:
!$acc region do local(ijk,i,j,k), copy(vvect(:,igfy:igfyp1))
     do ijk=imoj4,imoj5

            if(iffs.eq.0 .and. nf(ijk).ne.0) cycle

            i=i_str(ijk)
            j=j_str(ijk)
            k=k_str(ijk)
c
            include '../comdeck/mijk.f'
            include '../comdeck/pijk.f'

            if(wl.eq.4 .and. i.eq.iprr .and. imax.gt.4) then
              i2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+2+ii5)
              uhalfp=-dudp(ijk)*(vvect(i2jk,igfy)-vvect(ijk,igfy))
            else
              uhalfp=-dudp(ijk)*(vvect(ipjk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wl.eq.4 .and. i.eq.iprl .and. imax.gt.4) then
              im2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+im2+ii5)
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(im2jk,igfy))
            else
              uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(imjk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprbk .and. jmax.gt.4) then
              ij2k=ijk_str2unstr(ii2*(k-1)+ii1+i+ii5)
              vhalfp=-dvdp(ijk)*(vvect(ij2k,igfy)-vvect(ijk,igfy))
            else
              vhalfp=-dvdp(ijk)*(vvect(ijpk,igfy)-vvect(ijk,igfy))
            endif
c
            if(wf.eq.4 .and. j.eq.jprf .and. jmax.gt.4) then
              ijm2k=ijk_str2unstr(ii2*(k-1)+ii1*(jm2-1)+i+ii5)
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijm2k,igfy))
            else
              vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijmk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprt .and. kmax.gt.4) then
              ijk2=ijk_str2unstr(ii2+ii1*(j-1)+i+ii5)
              whalfp=-dwdp(ijk)*(vvect(ijk2,igfy)-vvect(ijk,igfy))
            else
              whalfp=-dwdp(ijk)*(vvect(ijkp,igfy)-vvect(ijk,igfy))
            endif
c
            if(wb.eq.4 .and. k.eq.kprb .and. kmax.gt.4) then
              ijkm2=ijk_str2unstr(ii2*(km2-1)+ii1*(j-1)+i+ii5)
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm2,igfy))
            else
              whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm,igfy))
            endif

            vvect(ijk,igfyp1)=rri(i)*(rdx(i)*(afr(ijk)*uhalfp/rr(i)-
     1        afr(imjk)*uhalfm/rr(i-1))+
     2        rdy(j)*(afb(ijk)*vhalfp-afb(ijmk)*vhalfm))+
     3        rdz(k)*(aft(ijk)*whalfp-aft(ijkm)*whalfm)
     4        +vf(ijk)*rcsqf(ijk)*rdelt*vvect(ijk,igfy)
              vvect(ijk,igfyp1)=vvect(ijk,igfyp1)*beta(ijk)

     enddo ! (ijk)

!$acc end region


Code:
$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -Mcuda -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F
(...)
             Generating copy(vvect(:,igfy:igfyp1))
(...)


After executing it on the GPU some elements in vvect array are NaN. They are not NaNs when the code is executed on the CPU.
The funny thing is that when I remove the copy() directive from code and leave only:

Code:
!$acc region do local(ijk,i,j,k)


The resulting array contains only zeros. It is weird because the compilator add the directive
Code:
             Generating copy(vvect(:,igfy:igfyp1))

by its own, so there should not be any difference.

So, any ideas where the NaNs are comming from and why those two versions of directives gives different results?

I though about emulating the GPU and writing out all the variables in each iteration, but I understand that I can not emulate the GPU using PGI Accelerator model, right? If I could I would check all the variables that are used to compute vvect elements. So, are there other ways than moving from PGI Accelerator model to CUDA Fortran to check it?
Back to top
View user's profile
szczelba



Joined: 29 Jun 2010
Posts: 26

PostPosted: Wed Oct 27, 2010 4:58 am    Post subject: Reply with quote

We have a sentence in Poland: "Who asks do not wander". So, I've asked you and partially solved my problem by my own. ;)
Ok, so the NaNs are caused by rcsqf array which is used in calculation of vvect. This array is declared as below:

Code:
      real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf


Others are declared similar but without the "target" directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Thu Oct 28, 2010 12:44 pm    Post subject: Reply with quote

Hi szczelba,

While I doubt it's the problem, mixing CUDA Fortran and the PGI Accelerator Model isn't supported on Windows. So the first thing to try is remove the "-Mcuda" flag.

Code:
$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F


Quote:
Others are declared similar but without the "target" directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?
I don't see how the target could effect this but then again it could be a compiler bug.

What is "rcsqf"'s Minfo copy message? What happens if you add "rcsqf" to the region's copy directive?

Can you send the code to PGI Customer Service (trs@pgroup.com) and ask them to send it to me? If it is compiler bug, I'd like to send in a report to our engineers.

Thanks,
Mat
Back to top
View user's profile
szczelba



Joined: 29 Jun 2010
Posts: 26

PostPosted: Fri Oct 29, 2010 7:43 am    Post subject: Reply with quote

The copy message is:

Code:
   Generating copyin(rcsqf$p(imoj4:imoj5))


I see this "$p" sign only in case of this array, which as the only one is defined as "target".
Adding rcsqf to the region copy directive does not change anything. Even the above copy message (doesn't change from copyin to copy).

When I copy the rcsqf values to another array on the GPU and then write out this temporal array i get something like:

Code:
 'ijk='         1225 ' '    0.000000000000000     
 'ijk='         1226 ' '    0.000000000000000     
 'ijk='         1227 ' '    0.000000000000000     
 'ijk='         1228 ' '    0.000000000000000     
 'ijk='         1229 ' '    0.000000000000000     
 'ijk='         1230 ' '    0.000000000000000     
 'ijk='         1231 ' ' ********************     
 'ijk='         1232 ' '    0.000000000000000     
 'ijk='         1233 ' ' ********************     
 'ijk='         1234 ' ' ********************     
 'ijk='         1235 ' ' ********************     
 'ijk='         1236 ' '                       NaN
 'ijk='         1237 ' '    0.000000000000000     
 'ijk='         1238 ' '    0.000000000000000     
 'ijk='         1239 ' ' ********************     
 'ijk='         1240 ' '    0.000000000000000     
 'ijk='         1241 ' ' ********************     
 'ijk='         1242 ' ' ********************     
 'ijk='         1243 ' '    0.000000000000000     
 'ijk='         1244 ' ' ********************     
 'ijk='         1245 ' ' ********************     
 'ijk='         1246 ' '    0.000000000000000     
 'ijk='         1247 ' ' ********************     
 'ijk='         1248 ' ' ********************     
 'ijk='         1249 ' '    0.000000000000000     
 'ijk='         1250 ' ' ********************     
 'ijk='         1251 ' ' ********************     
 'ijk='         1252 ' ' ********************     
 'ijk='         1253 ' '                       NaN
 'ijk='         1254 ' '    0.000000000000000     
 'ijk='         1255 ' '    0.000000000000000     
 'ijk='         1256 ' ' ********************     
 'ijk='         1257 ' '    0.000000000000000     
 'ijk='         1258 ' ' ********************     
 'ijk='         1259 ' ' ********************     
 'ijk='         1260 ' '    0.000000000000000     
 'ijk='         1261 ' ' ********************     
 'ijk='         1262 ' ' ********************     
 'ijk='         1263 ' '    0.000000000000000     
 'ijk='         1264 ' ' ********************     
 'ijk='         1265 ' '    0.000000000000000     
 'ijk='         1266 ' '    0.000000000000000     
 'ijk='         1267 ' ' ********************     
 'ijk='         1268 ' ' ********************     
 'ijk='         1269 ' ' ********************     
 'ijk='         1270 ' ' ********************     
 'ijk='         1271 ' '    0.000000000000000     
 'ijk='         1272 ' '    0.000000000000000     
 'ijk='         1273 ' ' ********************     
 'ijk='         1274 ' '    0.000000000000000     
 'ijk='         1275 ' ' ********************     
 'ijk='         1276 ' '    0.000000000000000     
 'ijk='         1277 ' '    0.000000000000000     
 'ijk='         1278 ' ' ********************     
 'ijk='         1279 ' ' ********************     
 'ijk='         1280 ' ' ********************     
 'ijk='         1281 ' ' ********************     
 'ijk='         1282 ' '    0.000000000000000     
 'ijk='         1283 ' '    0.000000000000000     
 'ijk='         1284 ' ' ********************     
 'ijk='         1285 ' '    0.000000000000000     
 'ijk='         1286 ' ' ********************     
 'ijk='         1287 ' '    0.000000000000000     
 'ijk='         1288 ' '    0.000000000000000     
 'ijk='         1289 ' ' ********************     
 'ijk='         1290 ' ' ********************     
 'ijk='         1291 ' ' ********************     
 'ijk='         1292 ' ' ********************     
 'ijk='         1293 ' '    0.000000000000000     
 'ijk='         1294 ' '    0.000000000000000     
 'ijk='         1295 ' ' ********************     
 'ijk='         1296 ' '    0.000000000000000     
 'ijk='         1297 ' ' ********************     
 'ijk='         1298 ' ' ********************     
 'ijk='         1299 ' '    0.000000000000000     
 'ijk='         1300 ' ' ********************     
 'ijk='         1301 ' ' ********************     
 'ijk='         1302 ' ' ********************     
 'ijk='         1303 ' '                       NaN
 'ijk='         1304 ' '    0.000000000000000     
 'ijk='         1305 ' '    0.000000000000000     
 'ijk='         1306 ' ' ********************     
 'ijk='         1307 ' '    0.000000000000000     
 'ijk='         1308 ' ' ********************     
 'ijk='         1309 ' ' ********************     
 'ijk='         1310 ' '    0.000000000000000     
 'ijk='         1311 ' ' ********************     
 'ijk='         1312 ' ' ********************     
 'ijk='         1313 ' '    0.000000000000000     
 'ijk='         1314 ' ' ********************     
 'ijk='         1315 ' ' ********************     
 'ijk='         1316 ' '    0.000000000000000     
 'ijk='         1317 ' ' ********************     
 'ijk='         1318 ' ' ********************     
 'ijk='         1319 ' ' ********************     
 'ijk='         1320 ' ' ********************     
 'ijk='         1321 ' '    0.000000000000000     
 'ijk='         1322 ' '    0.000000000000000     
 'ijk='         1323 ' ' ********************     
 'ijk='         1324 ' '    0.000000000000000     
 'ijk='         1325 ' ' ********************     
 'ijk='         1326 ' ' ********************     
 'ijk='         1327 ' '    0.000000000000000     
 'ijk='         1328 ' ' ********************     
 'ijk='         1329 ' ' ********************     
 'ijk='         1330 ' ' ********************     
 'ijk='         1331 ' ' ********************     
 'ijk='         1332 ' '    0.000000000000000     
 'ijk='         1333 ' '    0.000000000000000     
 'ijk='         1334 ' ' ********************     
 'ijk='         1335 ' '    0.000000000000000     
 'ijk='         1336 ' ' ********************     
 'ijk='         1337 ' ' ********************     
 'ijk='         1338 ' '    0.000000000000000     
 'ijk='         1339 ' ' ********************     
 'ijk='         1340 ' ' ********************     
 'ijk='         1341 ' ' ********************     
 'ijk='         1342 ' ' ********************     
 'ijk='         1343 ' '    0.000000000000000     
 'ijk='         1344 ' '    0.000000000000000     
 'ijk='         1345 ' ' ********************     
 'ijk='         1346 ' '    0.000000000000000     
 'ijk='         1347 ' ' ********************     
 'ijk='         1348 ' ' ********************     
 'ijk='         1349 ' '    0.000000000000000     
 'ijk='         1350 ' ' ********************     
 'ijk='         1351 ' ' ********************     
 'ijk='         1352 ' ' ********************     
 'ijk='         1353 ' ' ********************     
 'ijk='         1354 ' '    0.000000000000000     
 'ijk='         1355 ' '    0.000000000000000     
 'ijk='         1356 ' ' ********************     
 'ijk='         1357 ' '    0.000000000000000     
 'ijk='         1358 ' ' ********************     
 'ijk='         1359 ' ' ********************     
 'ijk='         1360 ' '    0.000000000000000     
 'ijk='         1361 ' ' ********************     


Besides some NaNs there are also some stars instead of values.

Sending all the code would be difficult because I'm working on a program that belongs to someone else. I have source code of only one procedure and execute it by starting the main program with special parameters. I'm rather not allowed to send this code to anybody.
Back to top
View user's profile
Michael Wolfe



Joined: 19 Jan 2010
Posts: 42

PostPosted: Fri Oct 29, 2010 6:39 pm    Post subject: Reply with quote

Fortran arrays declared with the target attribute are usually the target of pointer assignments. Look for a pointer assignment, something like
Code:
    ptr => rcsqf

where ptr is any Fortran pointer array. If there is a pointer assignment, and the pointer is also used in the accelerator region, there will be a problem. A program like
Code:
   real, dimension(:,:), allocatable, target :: a1
   real, dimension(:,:), pointer :: p1
   p1 => a1
   !$acc region do
    do i = 1, n
     a1(i) = 0.0
     b(i) = p1(i)
    enddo

In the original program, a1 and p1 are the same memory locations. However, the accelerator compiler can't preserve the pointer / target relationship of the data that is copies to the GPU. So the compiler will allocate and copy data for a1 and for p1 separately. On the host, p1(i) would get the same value that was just stored by a1(i)=0.0; on the GPU, p1(i) would get uninitialized memory, because the GPU copy of p1 would be at a different place in memory.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group