|
| View previous topic :: View next topic |
| Author |
Message |
paokara
Joined: 06 Feb 2011 Posts: 19
|
Posted: Mon Mar 25, 2013 11:58 am Post subject: |
|
|
Hi Mat and thank you
I did a couple of experiments last week.
1)First, we solve the NaN problem by putting the IF statement out of our parallel region.
| Code: |
if(flag) then
!$acc parallel
...
!$acc end parallel
endif
|
Why this was happenning? because of my "FLAG"? Every thread has its own copy of FLAG variable?
2)I have 2 adjacent loops inside this parallel region and in the first loop i use the reduction clause and i need the results in the second loop.
| Code: |
msys = 0
tmpx = 0
tmpy = 0
tmpz = 0
!$acc loop vector reduction(+:msys,tmpx,tmpy,tmpz)
do i=1,N
msys = msys + m(i)
tmpx = tempx +vx(i)
...
enddo
!$acc loop gang vector
do i=1,N
vxb(i) = vx(i) + tmpx/msys
vyb(i) = vy(i) + tmpy/msys
...
enddo
|
First of all, why do i need VECTOR clause in the first loop? (i get wrong results with GANG VECTOR)Because of the reduction?
Second,is it possible to take different results in two different execution of my program? Because in your article you say that there is a barrier at the end of the parallel region, not at the end of the first loop. So i believe that a random thread has not the correct values for the calculations in the second loop(for example: not correct value of the msys variable).
3)When i change my parallel region into kernel region i have another problem.From Nvidia Visual profiler i can see that there is a communication between host and device when my program reaches that region and i can't figure out the reason(is it because of the reduction?).I have a copy from host to device and then after the loop back to the host.With the parallel construct i don't see that communication and i have better time results. why is that happening?
Thank you for your help,
Sotiris |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Mar 26, 2013 1:58 pm Post subject: |
|
|
| Quote: | | Why this was happenning? because of my "FLAG"? Every thread has its own copy of FLAG variable? | I would need a reproducing example to tell why. Each thread would get it's own copy of flag1, but they should all be initialized to the same value. Most likely something else is the cause and unrelated to the if statement itself, but I can't tell what that is from what you have posted.
| Quote: | | First of all, why do i need VECTOR clause in the first loop? (i get wrong results with GANG VECTOR)Because of the reduction? |
Is this a typo in your post or a typo in your program?
| Quote: | | tmpx = tempx +vx(i) |
If this is directly from your program, then this could be source of your issues. "tempx" may not be initialized. "tmpx" would need a last value causing the loop to not be paralleizable and is probably why you need to use a "vector" clause to force parallization.
| Quote: | | 3)When i change my parallel region into kernel region i have another problem.From Nvidia Visual profiler i can see that there is a communication between host and device when my program reaches that region and i can't figure out the reason(is it because of the reduction?).I have a copy from host to device and then after the loop back to the host.With the parallel construct i don't see that communication and i have better time results. why is that happening? | It could be the result of the reduction since it needs to be passed between the kernels.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|