PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Segmentation fault--reason unknown
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling
View previous topic :: View next topic  
Author Message
yus



Joined: 06 Oct 2008
Posts: 18

PostPosted: Mon Oct 06, 2008 9:37 am    Post subject: Segmentation fault--reason unknown Reply with quote

I have a trouble in running a PGI compiled Fortran program on Cray XT4. This is an MPI program compiled with PGI 7.1.4, 7.2.2 and 7.2.3. The code has a segmentation fault in a subroutine. Compiled with -O0 and -g, the core dump points to an ENDIF in the subroutine. Around the ENDIF are just simple assigment statements where I can't find anything wrong.

The subroutine takes in 103 arguments among which around 70 arguments are double precision arrays with 1400 up to 5900 elements each (depending on the number of processes). The total size of argments is a few MB per process so it's not like a memory overflow. It's also not like a memory violation as all arguments are well declared.

The code can successfully run if compiled by gcc on the same XT4 machine.

With these observations, I can't find what causes the segfault. Does anybody have any clue of the possible causes? Is there any possible situation for PGI compiled code to report a segfault? Your ideas are appreciated.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Tue Oct 07, 2008 1:08 pm    Post subject: Reply with quote

Hi yus,

I'm not sure but wonder if it's a stack overflow. Try settng your stack size to unlimited and see if it works around the problem.

- Mat
Back to top
View user's profile
yus



Joined: 06 Oct 2008
Posts: 18

PostPosted: Wed Oct 08, 2008 5:06 am    Post subject: Reply with quote

Hi Mat,

I set ulimit -s unlimited -c unlimited. The code still has the same segmentation fault.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Oct 10, 2008 10:01 am    Post subject: Reply with quote

Hi yus,

Without looking at the code and running it through PGDBG, it's very difficult for me to know for sure what's going on. But some things to look for are writing off the end of an array or in the case automatics or pointers, a compiler temporary array being created could be given the wrong array subsection and overwrite protected memory. In either case, different compilers may exhibit different behavior depending upon how memory is laid out. Try adding "-Mbounds" to check for array bounds violations. It doesn't catch all cases, but wont hurt to try.

Also, does the error occur even if with a single thread? If the error only occurs with multiple threads, then the "ENDIF" error could be a red-herring and the seg fault is occurring in a different thread. If this is case, try using the MPICH which accompanies the PGI compilers and use PGDBG to debug the application (please refer to the PGI Tools guide on how to use PGDBG with MPI applications). This may give better insight to where the actual error occurs.

- Mat
Back to top
View user's profile
yus



Joined: 06 Oct 2008
Posts: 18

PostPosted: Mon Nov 03, 2008 5:50 am    Post subject: Possible problem on array optimization Reply with quote

Hi Mat and other experts,

I have some new findings in debugging the seg fault. It seems a problem of the optimization on array operations. The seg fault doesn't appear if compiled with -g -O0. However, it occurs when compiled with -O2. The fault happens in nested loops as follows:

DO I=1, L
! a 1D array initialization here like V=0. where V(1,L)
DO J=1,M
! also some scalar and 1D array inializations here
DO K=1,N
! Some scalar, 1D and 3D array computations here. The array subscritps depend on I, J, and K
END DO
END DO
END DO

The strange phenomenon is that the seg fault doesn't appear (even compiled with optimization -O2) when I replace the outer loops I and J with a simple value as follows:

I=1
! initialization for index I=1 only
J=1
! initialization for index J=1 only
DO K=1, N
! The same scalar, 1D and 3D array computations for I,J=1 and K=1,N
END DO

However, the seg fault comes again if I set the middle loop J to a single iteration as:

I=1
! initialization
DO J=1,1
! initialization
DO K=1, N
! Scalar and array computations
END DO
END DO

I wonder if the fault is possibly caused by the optimization on array operations. Does anybody have any experience and advice on the problem like this?

Thanks.


Last edited by yus on Mon Nov 03, 2008 10:23 am; edited 1 time in total
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Debugging and Profiling All times are GMT - 7 Hours
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group