PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenMP and Accelerator directives
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
RobertsGroup



Joined: 15 Sep 2009
Posts: 9

PostPosted: Thu Dec 10, 2009 10:06 am    Post subject: OpenMP and Accelerator directives Reply with quote

Hi all,

I have a loop that is parallelizable. When I use openmp directives, keeping all the variables private but the result, I obtain the same results that if I run without the OpenMP flag. However, when I change the to the !$acc pragma, keeping the same variables private, the results are completly different. How is that possible?

Here is the parallelizable loop using both architectures,

!$omp parallel
!$omp do private(Vbond,Vangle,r1x,r2x), &
!$omp private(r1y,r2y,r1z,r2z), &
!$omp private(r,a,r_1,r_2,th,costh,i)

do 101 i=1,resids
Vbond=0.0D0
Vangle=0.0D0
Vdieh=0.0D0

r1x=(r_n(i,1)-r_ca(i,1))
r1y=(r_n(i,2)-r_ca(i,2))
r1z=(r_n(i,3)-r_ca(i,3))
r=(r1x**2+r1y**2+r1z**2)**0.50D0
r_1=r
Vbond=Vbond+0.50D0*kbond*(r-ro_nca)**2

r2x=(r_c(i,1)-r_ca(i,1))
r2y=(r_c(i,2)-r_ca(i,2))
r2z=(r_c(i,3)-r_ca(i,3))
r=(r2x**2+r2y**2+r2z**2)**0.50D0
r_2=r
Vbond=Vbond+0.50D0*kbond*(r-ro_cac)**2

a=r1x*r2x+r1y*r2y+r1z*r2z
costh=a/(r_1*r_2)
th=acos(costh)
Vangle=Vangle+0.50D0*kangle*(th-tho_ncac)**2

!$omp critical
E(i)=Vangle+Vbond
!$omp end critical

101 continue

!$omp end do
!$omp end parallel



!$acc region do copyin(r_n,r_ca,r_c), copy(E), &
!$acc private(Vbond,Vangle,r1x,r2x,r1y,r2y,r1z,r2z), &
!$acc private(r,a,r_1,r_2,th,costh,i)

do 101 i=1,resids
Vbond=0.0D0
Vangle=0.0D0
Vdieh=0.0D0

rx=(r_n(i,1)-r_ca(i,1))
ry=(r_n(i,2)-r_ca(i,2))
rz=(r_n(i,3)-r_ca(i,3))
r1x=rx
r1y=ry
r1z=rz
r=(rx**2+ry**2+rz**2)**0.50D0
r_1=r
Vbond=Vbond+0.50D0*kbond*(r-ro_nca)**2

rx=(r_c(i,1)-r_ca(i,1))
ry=(r_c(i,2)-r_ca(i,2))
rz=(r_c(i,3)-r_ca(i,3))
r2x=rx
r2y=ry
r2z=rz
r=(rx**2+ry**2+rz**2)**0.50D0
r_2=r
Vbond=Vbond+0.50D0*kbond*(r-ro_cac)**2

a=r1x*r2x+r1y*r2y+r1z*r2z
costh=a/(r_1*r_2)
th=acos(costh)
Vangle=Vangle+0.50D0*kangle*(th-tho_ncac)**2

E(i)=Vangle+Vbond

101 continue
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5943
Location: The Portland Group Inc.

PostPosted: Thu Dec 10, 2009 11:14 am    Post subject: Reply with quote

Hi Marco,

Quote:
How is that possible?


The only thing that jumps out is that your using acos, square root, and exponential operations which can be relatively imprecise on a GPU. Is your code precision sensitive?

One thing to try is to store your intermediary calculations in temporary arrays and compare the CPU and GPU results to determine where the divergence occurs.

On a side note, scalar variables are implicitly private in the Accelerator model. While it doesn't hurt to declare them private, it isn't necessary. Also, you can use the "copyout" clause for E an save some data movement costs.

Hope this helps,
Mat
Back to top
View user's profile
RobertsGroup



Joined: 15 Sep 2009
Posts: 9

PostPosted: Fri Dec 11, 2009 11:04 am    Post subject: Reply with quote

Thanks for the help. I did that and I could find where is the problem. I have some if statements inside the loop which allow to do extra operations for some of the values, something like this

!$acc region do copyin(r,y), copyout(E)
do i=1,n
V=r(i)**2
if (y(i).eq.1) then
V=V+r(i)**3
endif
E(i)=V
enddo

The problem is that it doesn't access to what is inside the if statement. I don't know why is occuring it and how to solve it. Could you give me some suggestions?


Thanks,
Marco
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5943
Location: The Portland Group Inc.

PostPosted: Fri Dec 11, 2009 11:57 am    Post subject: Reply with quote

Hi Marco,

Code:
The problem is that it doesn't access to what is inside the if statement.

This is a compiler bug. I just found it myself yesterday and reported it to our engineers as TPR#16426. I consider this as critical bug that must be fixed soon.

In the mean time, you might be able to work around the bug by using an undocumented flag "-ta=nvidia,oldcg". In 10.0 we implemented a code generator which does give better performance, but obvious still has a few problems. "oldcg" will use our previous code generator.

I apologize that our internal testing missed this error and hopefully can have it fixed by early next year.

- Mat
Back to top
View user's profile
RobertsGroup



Joined: 15 Sep 2009
Posts: 9

PostPosted: Fri Dec 11, 2009 12:45 pm    Post subject: Reply with quote

Mat,

Thanks for the hint, but PVF doesn't recognize that flag. I included it in the command line, and it gave me this message

Compiling Project ...
Energy_4bead_GPU.f90
-ta=nvidia:{analysis|nofma|keepbin|keepptx|keepgpu|maxregcount:<n>|cc10|cc11|cc13|fastmath|mul24|time}|host
Choose target accelerator
nvidia Select NVIDIA accelerator target
analysis Analysis only, no code generation
nofma Don't generate fused mul-add instructions
keepbin Keep kernel .bin files
keepptx Keep kernel .ptx files
keepgpu Keep kernel source files
maxregcount:<n>
Set maximum number of registers to use on the GPU
cc10 Compile for compute capability 1.0
cc11 Compile for compute capability 1.1
cc13 Compile for compute capability 1.3
fastmath Use fast math library
mul24 Use 24-bit multiplication for subscripting
time Collect simple timing information
host Compile for the host, i.e., no accelerator target
pgf95-Error-Switch -ta with unknown keyword oldcg
pgf95-Error-The -ta switch must specify an accelerator target

Energy_GPU build failed.


Probably I will have to wait until that bug is solved.

Marco
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group