PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

error 702

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
limtaejun



Joined: 07 May 2014
Posts: 9

PostPosted: Tue May 20, 2014 8:22 am    Post subject: error 702 Reply with quote

I am in the transition period migrating from Intel Fortran Compiler to PVF 14.2 to try openacc. My system has GTX 760 and i7-4770k; and currently, my monitor is connected to GTX 760. After spending lots of time, I succeeded in compiling the code, but couldn't run it because of the error 702. Please help me out.

I wanted to start from something simple. What the code below does is followings: there's a 3-tuple indexed by (iz,ia,ih) which is interpreted as an individual with some preference that is measured by "vf"; each individual maximize its preference by choosing two variables indexed by (iia,iih); "temp_vf" is a big array that stores all possible measured preference for each choice of (iia,iih); the first loop executes to calculate "temp_vf"; and the second loop is to obtain "vf" which is just obtained by maxval of "temp_vf"

At the moment, I wouldn't be concerned by data movement and just hope this code to run well.

Code:

124  subroutine dynamic_decision()                             
125
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf
127  real(8), dimension(2)              :: policy_temp
128  real(8) :: c
129  integer :: iz,ia,ih,iia,iih
130
131  !$acc parallel loop
132  do iz = 1, zn
133  do ia = 1, an
134  do ih = 1, hn
135
136      do iia = 1, an
137      do iih = 1, hn
138
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)
140
141          if (c <= 0.0d0) then
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10
143          else
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih)) 
146          end if
147
148      end do
149      end do
150  end do
151  end do
152  end do
153  !$acc end parallel loop
154  !$acc parallel loop
155  do iz = 1, zn
156  do ia = 1, an
157  do ih = 1, hn
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
159  end do
160  end do
161  end do
162  !$acc end parallel loop
163
164  end subroutine


To give an extra information on the accelerating region:

Code:

    131, Accelerator kernel generated
        132, !$acc loop gang ! blockidx%x
        144, !$acc loop vector(256) ! threadidx%x
               Sum reduction generated for zt$r
    131, Generating present_or_copyin(old_vf(:zt$sd+old_vf$sd-         1,1:an,1:hn))
         Generating present_or_copyin(zt(1:zn,:))
         Generating present_or_copyin(ag(1:an))
         Generating present_or_copyin(pol_inc(1:zn,1:an,1:hn))
         Generating present_or_copyin(hg(1:hn))
         Generating present_or_copyout(temp_vf(:zn,:an,:hn,:an,:hn))
         Generating Tesla code
    133, Loop is parallelizable
    134, Loop is parallelizable
    136, Loop is parallelizable
    137, Loop is parallelizable
    144, Loop is parallelizable
    154, Accelerator kernel generated
        155, !$acc loop gang ! blockidx%x
        158, !$acc loop vector(256) ! threadidx%x
             Max reduction generated for temp_vf$r
    154, Generating present_or_copyin(temp_vf(:zn,:an,:hn,:an,:hn))
           Generating present_or_copyout(vf(1:zn,1:an,1:hn))
           Generating Tesla code
    156, Loop is parallelizable
    157, Loop is parallelizable
    158, Loop is parallelizable


The error I see when running the code:
[img]
https://dl-web.dropbox.com/get/openacc/error1.PNG?_subject_uid=39482831&w=AABpU_BXmhCrKDqdmebv54HEdB1cSi3Gye3f8qRu_zpJ5g
[/img]

and also
[img]
https://dl-web.dropbox.com/get/openacc/error2.PNG?_subject_uid=39482831&w=AADxFJIO1eM0zyUgYUdmHTpaUcYPH6anBsMFmhVhzBiC3A
[/img]

Would anyone please kindly let me know how to fix this?

Best,
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Tue May 20, 2014 9:21 am    Post subject: Reply with quote

Hi limtaejun,

I would try copying the whole "old_vf" array over to the device. By default, the compiler tries to move the least amount of data over but might be having a difficult time determining how much to bring over given it's in a dot product.
Quote:
131, Generating present_or_copyin(old_vf(:zt$sd+old_vf$sd- 1,1:an,1:hn))


I'd also put "temp_vf" in a data region "create" clause so it doesn't copied and explicitly copy in the remaining arrays.

The one caveat of the "copyout" of "vf" is if the entire array is not updated on the device, this will overwrite some of the host values with garbage values. In this case, either change to using the "copy" clause or copy out only the updated array section. However, sub-arrays take longer to copy since they can't be transferred in a contiguous block.

Code:
124  subroutine dynamic_decision()                             
125
126  real(8), dimension(zn,an,hn,an,hn) :: temp_vf
127  real(8), dimension(2)              :: policy_temp
128  real(8) :: c
129  integer :: iz,ia,ih,iia,iih
130
        !$acc data create(temp_vf) copyin(old_vf,zT,hG,aG,pol_inc) copyout(vf)
131  !$acc parallel loop
132  do iz = 1, zn
133  do ia = 1, an
134  do ih = 1, hn
135
136      do iia = 1, an
137      do iih = 1, hn
138
139          c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)
140
141          if (c <= 0.0d0) then
142              temp_vf(iz,ia,ih,iia,iih) = -1.0d10
143          else
144              temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
145                                      + beta * dot_product(zT(iz,:),old_vf(:,iia,iih)) 
146          end if
147
148      end do
149      end do
150  end do
151  end do
152  end do
153  !$acc end parallel loop
154  !$acc parallel loop
155  do iz = 1, zn
156  do ia = 1, an
157  do ih = 1, hn
158      vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
159  end do
160  end do
161  end do
162  !$acc end parallel loop
        !$acc end data
163
164  end subroutine



- Mat
Back to top
View user's profile
limtaejun



Joined: 07 May 2014
Posts: 9

PostPosted: Tue May 20, 2014 8:04 pm    Post subject: Reply with quote

Hi Mat,

Following your suggestions, I modified my code:
Code:

subroutine dynamic_decision()                             

real(8), dimension(zn,an,hn,an,hn) :: temp_vf
real(8) :: c
integer :: iz,ia,ih,iia,iih

!$acc data create(temp_vf) &
!$acc&     copyin(old_vf,zT,aG,hG,pol_inc) &
!$acc&     copyout(vf)     
!$acc parallel loop
do iz = 1, zn
do ia = 1, an
do ih = 1, hn
   
    do iia = 1, an
    do iih = 1, hn

        c = pol_inc(iz,ia,ih) + unitP*( hG(ih)-hG(iih) )   - aG(iia)

        if (c <= 0.0d0) then
            temp_vf(iz,ia,ih,iia,iih) = N_A
        else
            temp_vf(iz,ia,ih,iia,iih) = (   (  c* (hG(ih))  ) ** (1.0d0-sig)    )
                                     + beta * dot_product(zT(iz,:),old_vf(:,iia,iih)) 
        end if

    end do
    end do
end do
end do
end do
!$acc end parallel loop
!$acc parallel loop
do iz = 1, zn
do ia = 1, an
do ih = 1, hn
    vf(iz,ia,ih) = maxval(temp_vf(iz,ia,ih,:,:))
end do
end do
end do
!$acc end parallel loop
!$acc end data

end subroutine


And, as before, the build was successful:
Code:

    130, Generating create(temp_vf(:,:,:,:,:))
         Generating copyin(old_vf(:,:,:))
         Generating copyin(zt(:,:))
         Generating copyin(ag(:))
         Generating copyin(hg(:))
         Generating copyin(pol_inc(:,:,:))
         Generating copyout(vf(:,:,:))
    133, Accelerator kernel generated
        134, !$acc loop gang ! blockidx%x
        146, !$acc loop vector(256) ! threadidx%x
             Sum reduction generated for zt$r
    133, Generating Tesla code
    135, Loop is parallelizable
    136, Loop is parallelizable
    138, Loop is parallelizable
    139, Loop is parallelizable
    146, Loop is parallelizable
    156, Accelerator kernel generated
        157, !$acc loop gang ! blockidx%x
        160, !$acc loop vector(256) ! threadidx%x
             Max reduction generated for temp_vf$r
    156, Generating Tesla code
    158, Loop is parallelizable
    159, Loop is parallelizable
    160, Loop is parallelizable


But when being run, it resulted in the same error message as before. After doing some test of trials and errors, I thought that all came from the intrinsic function of maxval and dotproduct. Indeed, when I rebuild and run the code not using these two functions, the errors are gone.

Any advise how to deal with this?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed May 21, 2014 4:09 pm    Post subject: Reply with quote

Error 702 is a timeout. My guess that the problem is not with maxval or dot_product, just that these take longer to compute.

Since you're on Windows using a GTX, this means you're using the Windows Display Device Monitor (WDDM). WDDM will timeout your job after a few seconds to prevent freezing of your monitor. If you had a Quadro or Tesla you could change to using the Tesla Compute Cluster (TCC) driver, but with GTX you're stuck with WDDM.

If you do a web search you can find ways to increase the timeout, but given it means hacking your registry, I wouldn't recommend it.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group