PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

CPU parallel and accelerator regions in the same program
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Wed Nov 09, 2011 9:15 am    Post subject: CPU parallel and accelerator regions in the same program Reply with quote

Hi, I've been trying to use the PGI accelerated compiler version 11.10 to parallelize a code across both the CPU and GPU. Best case scenario I was hoping to have something like the following.

Code:

#pragma omp parallel
{
   int tid = omp_get_thread_num();
   printf("id:%d\n", tid);
   if(tid == 0){
      acc_set_device_num(0, acc_device_nvidia);
      #pragma acc region for
      ...
   }else if(tid == 1){
      acc_set_device_num(1, acc_device_nvidia);
      #pragma acc region for
      ...
   }else{
      ...
   }
   
}


where ... is some code to accelerate. I tried that first, and ran into segfaults. Now I'm down to trying anything I can think of, but every time I try and have CPU parallel and GPU parallel regions in the same code, I get a segfault in the CPU region. GDB gives me something like the following.

Code:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaab14d6940 (LWP 6785)]
0x00000000004106df in _mp_penter64 ()
(gdb) bt
#0  0x00000000004106df in _mp_penter64 ()


I've tried everything I can think of to get these to work together, up to and including using pthreads to create the outer threads and run the cpu and gpu regions in separate pthreads, and always get the same result. Once in a very long while, the code will run without segfault, but when that happens it hangs. The most basic version of what I've been trying to do is this.

Code:

#define SIZE 100
int main(int argc, char * argv[])
{
    int stuff[SIZE];
    int limit = omp_get_thread_limit();
    printf("limit:%d\n", limit);
#pragma omp parallel shared(stuff)
    {
        int tid = omp_get_thread_num();
        int i;
        if(tid == 0){
#pragma acc region for copy(stuff)
            for(i = 0; i<SIZE; i++)
            {
                stuff[i] = 1;
            }
        }
        printf("thread_id:%d\n", tid);
    }
    return 0;
}


That said, even this fails.

Code:

#define SIZE 100
int main(int argc, char * argv[])
{
    int stuff[SIZE];
    int limit = omp_get_thread_limit();
    printf("limit:%d\n", limit);
#pragma omp parallel shared(stuff)
    {
        int tid = omp_get_thread_num();
        printf("thread_id:%d\n", tid);
    }
    int i;
#pragma acc region for copy(stuff)
    for(i = 0; i<SIZE; i++)
    {
        stuff[i] = 1;
    }
    return 0;
}


Which seems to be the same bug as in http://www.pgroup.com/userforum/viewtopic.php?t=2374&highlight=multiple since it works fine if I remove either one, but not at all if both stay,
but there is no resolution there. Any ideas what might be going wrong?

Platform details:
2 c2050 GPUs
2 6-core intel CPUs
chaos linux 2.6.18-107
PGI accelerator 11.10
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Nov 09, 2011 3:53 pm    Post subject: Reply with quote

Hi njustn,

This is an odd one and like the other forum post, both programs work fine for me. Hence, I suspect it's something specific to your system.

The "_mp_penter64" symbol is the entry point for a parallel region and one of the things it does is to set-up the threads stack space. It could be that you're getting a stack overflow. Try setting your shell's stack size limit or set the environment variable "OMP_STACKSIZE" to a large value (or unlimited).

If that doesn't work, please compile with "-Manno -Mkeepasm" to save the generated assembly code to a ".s" file. Open the file and look for the call to "_mp_penter". Before this call there is an assembly statement moving a data initialization symbol name like ".prvt0001" in to a register to be passed to "mp_penter". Find this symbol towards the bottom of the file and please let me know what the data initialization values are.

For example:
Code:

##  lineno: 8
..LN3:

        xorl    %esi, %esi
        movq    .prvt0001(%rip), %rdi
        call    _mp_penter
....
.prvt0001:
        .align  8
        .long   336
        .long   0


Also, if you could rerun the program in gdb and get the dis-assembly to determine the exact instruction where the segv occurs. The values of the regesiters would also be helpful.

Thanks,
Mat
Back to top
View user's profile
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Thu Nov 10, 2011 1:00 pm    Post subject: Tested, no luck Reply with quote

HI mkcolg,

I tested the OMP_STACKSIZE option with some absurdly large values, and it still blew up. The values you requested are below, all for the code listed last in my original post.

Code:

.prvt0001:
        .align  8
        .long   256
        .long   0
        .globl  __pgi_cu_alloc        .globl  __pgi_cu_close
        .globl  __pgi_cu_free
        .globl  __pgi_cu_downloadx
        .globl  __pgi_cu_launch2
        .globl  __pgi_cu_paramset
        .globl  __pgi_cu_datadone
        .globl  __pgi_cu_uploadx
        .globl  __pgi_cu_module_function3
        .globl  __pgi_cu_module3
        .globl  __pgi_cu_init
        .globl  _mp_pexit
        .globl  omp_get_thread_num        .globl  _mp_penter
        .globl  printf        .globl  omp_get_thread_limit
        .data        .align  8
        .globl  __pgdbg_stub
        .quad   __pgdbg_stub
        .text


Also the disassembled asm is below, it segfaulted on 0x00000000004105df.

Code:

(gdb) disassemble _mp_penter64Dump of assembler code for function _mp_penter64:
0x00000000004105c8 <_mp_penter64>:    push   %rbp
0x00000000004105c9 <_mp_penter64>:    mov    %rsp,%rbp
0x00000000004105cc <_mp_penter64>:    push   %rdi
0x00000000004105cd <_mp_penter64>:    push   %rsi
0x00000000004105ce <_mp_penter64>:    callq  0x40ed6a <_mp_init>
0x00000000004105d3 <_mp_penter64>:   pop    %rsi
0x00000000004105d4 <_mp_penter64>:   pop    %rdi
0x00000000004105d5 <_mp_penter64>:   sub    %rdi,%rsp
0x00000000004105d8 <_mp_penter64>:   sub    $0x100,%rsp

(dies here)
0x00000000004105df <_mp_penter64>:   mov    %rdi,0xb8(%rsp)

0x00000000004105e7 <_mp_penter64>:   mov    0x8(%rbp),%rdi
0x00000000004105eb <_mp_penter64>:   mov    %rdi,0xb0(%rsp)
0x00000000004105f3 <_mp_penter64>:   mov    0x0(%rbp),%rdi
0x00000000004105f7 <_mp_penter64>:   mov    %rdi,0x98(%rsp)
0x00000000004105ff <_mp_penter64>:   movq   $0x0,0xa8(%rsp)
0x000000000041060b <_mp_penter64>:   movq   $0x1,0x10(%rsp)
0x0000000000410614 <_mp_penter64>:   mov    %rsi,0x48(%rsp)
0x0000000000410619 <_mp_penter64>:   callq  0x40db7d <_mp_get_par>
0x000000000041061e <_mp_penter64>:   cmp    $0x0,%rax
0x0000000000410622 <_mp_penter64>:   jne    0x4106cd <_mp_penter64>
0x0000000000410628 <_mp_penter64>:   cmpq   $0x0,0x48(%rsp)
0x000000000041062e <_mp_penter64>:  jne    0x4106f4 <_mp_penter64>
0x0000000000410634 <_mp_penter64>:  callq  0x40db4e <_mp_get_tcpus>
0x0000000000410639 <_mp_penter64>:  cmp    $0x1,%rax
0x000000000041063d <_mp_penter64>:  je     0x4106dd <_mp_penter64>
0x0000000000410643 <_mp_penter64>:  mov    %rax,0x10(%rsp)
0x0000000000410648 <_mp_penter64>:  mov    %rbx,0x254461(%rip)        # 0x664ab0 <x_rbx>
0x000000000041064f <_mp_penter64>:  mov    %r12,0x254462(%rip)        # 0x664ab8 <x_r12>
0x0000000000410656 <_mp_penter64>:  mov    %r13,0x254463(%rip)        # 0x664ac0 <x_r13>
0x000000000041065d <_mp_penter64>:  mov    %r14,0x254464(%rip)        # 0x664ac8 <x_r14>
0x0000000000410664 <_mp_penter64>:  mov    %r15,0x254465(%rip)        # 0x664ad0 <x_r15>
0x000000000041066b <_mp_penter64>:  mov    0x98(%rsp),%rdi
0x0000000000410673 <_mp_penter64>:  mov    %rdi,0x25442e(%rip)        # 0x664aa8 <x_orbp>
0x000000000041067a <_mp_penter64>:  mov    0xb0(%rsp),%rdi
0x0000000000410682 <_mp_penter64>:  mov    %rdi,0x254417(%rip)        # 0x664aa0 <x_oret>
0x0000000000410689 <_mp_penter64>:  mov    0x10(%rsp),%rdi
0x000000000041068e <_mp_penter64>:  mov    %rdi,0x2543fb(%rip)        # 0x664a90 <x_ncpu>
0x0000000000410695 <_mp_penter64>:  mov    0xb8(%rsp),%rdi
0x000000000041069d <_mp_penter64>:  mov    %rdi,0x2543f4(%rip)        # 0x664a98 <x_priv>
0x00000000004106a4 <_mp_penter64>:  fnstcw 0x2543de(%rip)        # 0x664a88 <x_fpuc>
0x00000000004106aa <_mp_penter64>:  mov    $0x2,%rdi
0x00000000004106b1 <_mp_penter64>:  callq  0x40db5a <_mp_set_par>
0x00000000004106b6 <_mp_penter64>:  mov    $0x0,%rdi
0x00000000004106bd <_mp_penter64>:  callq  0x40e3ef <_mp_barrierr>
0x00000000004106c2 <_mp_penter64>:  movq   $0x0,0x8(%rsp)
0x00000000004106cb <_mp_penter64>:  jmp    0x4106fd <_mp_penter64>
0x00000000004106cd <_mp_penter64>:  callq  0x40dbc0 <_mp_penter_d>
0x00000000004106d2 <_mp_penter64>:  movq   $0x2,0x8(%rsp)
0x00000000004106db <_mp_penter64>:  jmp    0x4106fd <_mp_penter64>
0x00000000004106dd <_mp_penter64>:  mov    $0x1,%rdi
0x00000000004106e4 <_mp_penter64>:  callq  0x40db5a <_mp_set_par>
0x00000000004106e9 <_mp_penter64>:  movq   $0x1,0x8(%rsp)
0x00000000004106f2 <_mp_penter64>:  jmp    0x4106fd <_mp_penter64>
0x00000000004106f4 <_mp_penter64>:  movq   $0x3,0x8(%rsp)
0x00000000004106fd <_mp_penter64>:  mov    0x98(%rsp),%rbp
0x0000000000410705 <_mp_penter64>:  mov    0xb0(%rsp),%r11
0x000000000041070d <_mp_penter64>:  push   %r11
0x000000000041070f <_mp_penter64>:  retq   
End of assembler dump.
Back to top
View user's profile
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Thu Nov 10, 2011 2:48 pm    Post subject: register values Reply with quote

I forgot to add the register values, they are below.

Code:

(gdb) info registers
rax            0x1      1
rbx            0x2aaaaacc7bc0   46912498334656
rcx            0x0      0
rdx            0xf4240  1000000
rsi            0x0      0
rdi            0x10000000000    1099511627776
rbp            0x7fffffffded0   0x7fffffffded0
rsp            0x7effffffddd0   0x7effffffddd0
r8             0x2aaaab8c42f0   46912510903024
r9             0x7fffffffc18a   140737488339338
r10            0x0      0
r11            0x2aaaab5a46b0   46912507627184
r12            0x0      0
r13            0x7fffffffe1c0   140737488347584
r14            0x0      0
r15            0x0      0
rip            0x4105df 0x4105df <_mp_penter64>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x37f    895
fstat          0x0      0
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
mxcsr          0x1fc0   [ DAZ IM DM ZM OM UM PM ]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Thu Nov 10, 2011 4:31 pm    Post subject: Reply with quote

Code:

(dies here)
0x00000000004105df <_mp_penter64>:   mov    %rdi,0xb8(%rsp)


This is definitely a stack overflow since the segv occurs when referencing the stack pointer (rsp). Can you check your shell's stack size limit?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group