PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Segmentation fault

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
elephant



Joined: 24 Feb 2011
Posts: 22

PostPosted: Mon Apr 04, 2011 9:03 am    Post subject: Segmentation fault Reply with quote

Dear All

I am facing a problem while porting a large code with many subroutines using the PGI Accelerator directives. When I compile the code with pgf90 (without a target acceleraor flag) the program works. Now I just added two !$acc region / !$acc end region directives over two loops of one subroutine two convert them to two GPU compute regions. When I now compile the code with the -ta=nvidia flag I get the following compiler feedback for this particular subroutine.
When I now run the program it terminates with a "Segmentation fault".
I also tried to accelerate several other loops (each one individualy) and I always faced the same problem.
What do I have to pay attention to? What could cause this fault?
Thank you very much in advance.

Code:

 162, Generating copyout(vort$p(1:3,1:knend))
         Generating compute capability 1.3 binary
    163, Loop carried dependence of 'uf$p' prevents parallelization
         Loop carried backward dependence of 'uf$p' prevents vectorization
         Complex loop carried dependence of 'vort$p' prevents parallelization
         Sequential loop scheduled on host
         Loop not vectorized/parallelized: contains call
    164, Loop is parallelizable
         Accelerator kernel generated
        164, !$acc do parallel, vector(3)
             CC 1.3 : 4 registers; 24 shared, 24 constant, 0 local memory bytes; 25 occupancy
    185, Generating copyin(pcell$p(1:8,1:kcend))
         Generating compute capability 1.3 binary
    195, Complex loop carried dependence of 'q$p' prevents parallelization
         Complex loop carried dependence of 'qt$p' prevents parallelization
         Complex loop carried dependence of 'vc$p' prevents parallelization
         Complex loop carried dependence of 'sixl$p' prevents parallelization
         Complex loop carried dependence of 'sixr$p' prevents parallelization
         Complex loop carried dependence of 'siyl$p' prevents parallelization
         Complex loop carried dependence of 'siyr$p' prevents parallelization
         Complex loop carried dependence of 'sizl$p' prevents parallelization
         Complex loop carried dependence of 'sizr$p' prevents parallelization
         Complex loop carried dependence of 'sjxl$p' prevents parallelization
         Complex loop carried dependence of 'sjxr$p' prevents parallelization
         Complex loop carried dependence of 'sjyl$p' prevents parallelization
         Complex loop carried dependence of 'sjyr$p' prevents parallelization
         Complex loop carried dependence of 'sjzl$p' prevents parallelization
         Complex loop carried dependence of 'sjzr$p' prevents parallelization
         Complex loop carried dependence of 'skxl$p' prevents parallelization
         Complex loop carried dependence of 'skxr$p' prevents parallelization
         Complex loop carried dependence of 'skyl$p' prevents parallelization
         Complex loop carried dependence of 'skyr$p' prevents parallelization
         Complex loop carried dependence of 'skzl$p' prevents parallelization
         Complex loop carried dependence of 'skzr$p' prevents parallelization
         Loop carried dependence of 'dudx$p' prevents parallelization
         Loop carried backward dependence of 'dudx$p' prevents vectorization
         Complex loop carried dependence of 'dudx$p' prevents parallelization
         Loop carried dependence of 'dudy$p' prevents parallelization
         Loop carried backward dependence of 'dudy$p' prevents vectorization
         Complex loop carried dependence of 'dudy$p' prevents parallelization
         Loop carried dependence of 'dudz$p' prevents parallelization
         Loop carried backward dependence of 'dudz$p' prevents vectorization
         Complex loop carried dependence of 'dudz$p' prevents parallelization
         Loop carried dependence of 'dvdx$p' prevents parallelization
         Loop carried backward dependence of 'dvdx$p' prevents vectorization
         Complex loop carried dependence of 'dvdx$p' prevents parallelization
         Loop carried dependence of 'dvdy$p' prevents parallelization
         Loop carried backward dependence of 'dvdy$p' prevents vectorization
         Complex loop carried dependence of 'dvdy$p' prevents parallelization
         Loop carried dependence of 'dvdz$p' prevents parallelization
         Loop carried backward dependence of 'dvdz$p' prevents vectorization
         Complex loop carried dependence of 'dvdz$p' prevents parallelization
         Loop carried dependence of 'dwdx$p' prevents parallelization
         Loop carried backward dependence of 'dwdx$p' prevents vectorization
         Complex loop carried dependence of 'dwdx$p' prevents parallelization
         Loop carried dependence of 'dwdy$p' prevents parallelization
         Loop carried backward dependence of 'dwdy$p' prevents vectorization
         Complex loop carried dependence of 'dwdy$p' prevents parallelization
         Loop carried dependence of 'dwdz$p' prevents parallelization
         Loop carried backward dependence of 'dwdz$p' prevents vectorization
         Complex loop carried dependence of 'dwdz$p' prevents parallelization
         Loop carried dependence of 'dtdx$p' prevents parallelization
         Loop carried backward dependence of 'dtdx$p' prevents vectorization
         Complex loop carried dependence of 'dtdx$p' prevents parallelization
         Loop carried dependence of 'dtdy$p' prevents parallelization
         Loop carried backward dependence of 'dtdy$p' prevents vectorization
         Complex loop carried dependence of 'dtdy$p' prevents parallelization
         Loop carried dependence of 'dtdz$p' prevents parallelization
         Loop carried backward dependence of 'dtdz$p' prevents vectorization
         Complex loop carried dependence of 'dtdz$p' prevents parallelization
         Loop carried dependence of 'dkdx$p' prevents parallelization
         Loop carried backward dependence of 'dkdx$p' prevents vectorization
         Complex loop carried dependence of 'dkdx$p' prevents parallelization
         Loop carried dependence of 'dkdy$p' prevents parallelization
         Loop carried backward dependence of 'dkdy$p' prevents vectorization
         Complex loop carried dependence of 'dkdy$p' prevents parallelization
         Loop carried dependence of 'dkdz$p' prevents parallelization
         Loop carried backward dependence of 'dkdz$p' prevents vectorization
         Complex loop carried dependence of 'pcell$p' prevents parallelization
         Sequential loop scheduled on host
         Generating copyout(k(1:8))
         Loop not vectorized/parallelized: contains call
    196, Loop is parallelizable
         Accelerator kernel generated
        196, !$acc do parallel, vector(8)
             CC 1.3 : 4 registers; 24 shared, 32 constant, 0 local memory bytes; 25 occupancy
    408, Loop not vectorized: may not be beneficial
         Loop unrolled 4 times
    410, Loop not vectorized: data dependency
    473, Loop not vectorized: data dependency
    699, Loop not vectorized: may not be beneficial
         Loop unrolled 4 times
    701, Loop not vectorized: data dependency
    728, Loop not vectorized: data dependency
    921, Loop not vectorized: data dependency
    950, Loop not vectorized: data dependency
    980, Loop not vectorized: data dependency
    995, Loop not vectorized: data dependency
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Apr 04, 2011 9:47 am    Post subject: Reply with quote

Hi elephant,

The seg fault could be caused by any number of things, exactly what I'm not sure. My best guess would be that compiler is generating some bad code since the outer loops are not paralleliable. It's trying to run the outer loops on sequentially on the host and the inner loops on the GPU.

To test this theory, try putting the ACC REGION directives only around the loops at line 164 and 196. If it works, then most likely the compiler is doing a poor job of managing data between the inner device loop and the outer host loop.

Though, my strategy here would be ignore the seg fault for now and work on modifying the code so that the outer loops parallelize. It looks like you have a lot of loop carried dependencies as well as a function call.

Also, please feel free to send in the code to PGI Customer Service (trs@pgroup.com) so we can take a look at the seg fault, and if it is indeed a compiler error, then we can fix the problem.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group