PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Common blocks in OpenMP
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Mon Mar 11, 2013 12:24 pm    Post subject: Reply with quote

dcwarren wrote:
I am using the debugger, but I've turned off optimization. One of the greybeards in my department suggested it might be an array overstepping its bounds somehow, so I'll try that and print/flush statements to figure it out. Thanks for the speedy response.


Well, this is weird. Running with -C turns up nothing new ("ACCESS VIOLATION"), but I can't even print these problem variables using a "print *" without getting that same segfault error. I've even tried inlining the subroutine and have the same problem.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Mon Mar 11, 2013 12:50 pm    Post subject: Reply with quote

Sounds like it might be a stack overflow. What happens if you increase your stack size?

- Mat
Back to top
View user's profile
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Tue Mar 12, 2013 7:47 am    Post subject: Reply with quote

After some testing of the "-stack" flag, I've conclusively determined I don't understand what the stack is and how it works. I thought the stack was the memory allocated in RAM for the program to use as scratch space for its calculations. As such, if I increased the stack size I should see a corresponding increase in memory used according to Windows Task Manager. Needless to say, this isn't what I'm seeing.

The code will crash if I compile with anything less than "-stack=(no)check,6e8,6e8" (the first option makes no difference, and I'm saving you the trouble of counting zeroes). Regardless of (1) the amount of memory reserved for the stack and (2) whether I compile with "check" or "nocheck", Task Manager tells me my code takes roughly 21MB of memory.

So now the code can run to completion, but there's another issue: I don't get the same results with exactly one OpenMP thread as I do compiling without "-mp". The code does use a random number generator throughout, but why should it generate different states with/without OpenMP when there's only ever one thread using it? (Testing with more than one thread will follow, but I want to make sure I understand the basics first.)

----
Edit: Hooray, more oddness! If I use more than one thread, I eventually get an out-of-bounds error at an array. Here's the structure of the code throwing the error:
Code:
do i = 1, n_dummy
  x_local = x_array(i)
  ...

The error message says that I'm trying to access a value of x_array greater than its maximum, which is 41; however, n_dummy is a runtime constant with value 4. Using the debugger -- again without any optimization -- I have checked that each thread has the correct value of n_dummy just prior to the loop. Within the loop, the iteration variable i has a value of 601028592 or so (and while it is greater than 41, this isn't the value the error message reports...). This is most certainly not between 1 and 4. Have you seen anything like this before?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Tue Mar 12, 2013 9:53 am    Post subject: Reply with quote

Quote:
Regardless of (1) the amount of memory reserved for the stack and (2) whether I compile with "check" or "nocheck", Task Manager tells me my code takes roughly 21MB of memory.
The "check"/"nocheck" sub-option enables or disables the runtime code which dynamically commits more stack size as it's needed. When the commit size is reached and "check" is enabled, more stack is committed up until the reserve size is reached. With "nocheck", this initialization code is removed and it's assumed that you have specified a large enough commit size. It will not effect the size of the stack itself.

Quote:
I don't get the same results with exactly one OpenMP thread as I do compiling without "-mp". The code does use a random number generator throughout, but why should it generate different states with/without OpenMP when there's only ever one thread using it?
The main difference between compiling with and without "-mp" (excluding the OpenMP directives themselves) is that automatic arrays are allocated on the stack. I'd look for uninitialized memory with one of these arrays.

Another possibility is that different optimizations are being applied. How different are the results and do they continue to be different when compiled without optimization (i.e. "-O0")?

Quote:
Within the loop, the iteration variable i has a value of 601028592 or so (and while it is greater than 41, this isn't the value the error message reports...). This is most certainly not between 1 and 4. Have you seen anything like this before?
Is "i" private?

- Mat
Back to top
View user's profile
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Tue Mar 12, 2013 1:27 pm    Post subject: Reply with quote

mkcolg wrote:
The main difference between compiling with and without "-mp" (excluding the OpenMP directives themselves) is that automatic arrays are allocated on the stack. I'd look for uninitialized memory with one of these arrays.

Another possibility is that different optimizations are being applied. How different are the results and do they continue to be different when compiled without optimization (i.e. "-O0")?

As far as I'm aware, there are no automatic arrays anywhere in the code; all arrays are shaped according to parameters set in a module.

Without any optimization, the results are still different. It's a Monte Carlo code tracking particles interacting with a shock structure, and what I'm seeing is different numbers of particles making it through each "gate" depending on whether I've enabled or disabled OpenMP. Broadly the results seem to be the same, I just can't think of any reason why a single OpenMP thread should be different from a serial thread. Actually, I take that back; OpenMP unsets certain variables after exiting the parallel region. Is there a short list of conditions for this unsetting I can check? I'm already aware of private variables and shared pointers to private variables.

For lack of any better ideas, I'm changing my OpenMP region from "default(shared)" to "default(none)" just to make sure I have every variable assigned correctly.

Quote:
Is "i" private?

It's within a subroutine called from the OpenMP region, so I would assume yes. And OpenMP isn't like OpenACC (where you can explicitly tell the code to have each encountering thread execute a particular loop serially), so I would expect that loops in OpenMP default to serial execution unless explicitly placed inside an OMP region. Am I right in this?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group