PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

run time problems with 10.0
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
ink



Joined: 25 Nov 2008
Posts: 8

PostPosted: Mon Nov 23, 2009 9:39 am    Post subject: run time problems with 10.0 Reply with quote

Hello,
I have a simple code which runs fine when compiled with 9.0-4 but either runs much slower (as non accelerated) or does not run at all (just hangs) if compiled with 10.0.
Any thoughts?
thanks
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Nov 23, 2009 10:39 am    Post subject: Reply with quote

Hi ink,

A lot of changes went into 10.0, so it could be any of a number of things. You're welcome to send the code into PGI Customer Support (trs@pgroup.com) and we can take a look.

Otherwise, I would start with the informational messages (-Minfo=accel) to see what's changed. Perhaps the schedule selected by the compiler is no longer optimal and you need to use the "parallel" and "vector" clauses? Maybe the compiler is no longer caching a variable?

- Mat
Back to top
View user's profile
ink



Joined: 25 Nov 2008
Posts: 8

PostPosted: Mon Nov 23, 2009 12:05 pm    Post subject: Reply with quote

here is the code
10 #pragma acc region for parallel
11 for( i = 0 ; i < m; i++ ){
12 #pragma acc for parallel
13 for( k = 0; k < n; k++ ) {
14 #pragma acc for seq
15 for( j = 0; j < l; j++ ){
16 c[i][k] = c[i][k] + a[i][j]*b[j][k];
17 }
18 }
19 }
20 }

which is compiled with
pgcc -ta=nvidia:cc13 -Minfo -fast -Msafeptr=all -c
mxm:
10, Generating copyin(b[0:l-1][0:n-1])
Generating copyin(a[0:m-1][0:l-1])
Generating copy(c[0:m-1][0:n-1])
11, Loop is parallelizable
Accelerator kernel generated
11, #pragma acc for parallel
13, Loop is parallelizable
15, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization

I think the kernel is generated. What I don't understand is why it hangs like it can't allocate a device or is checking a license.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Nov 23, 2009 1:30 pm    Post subject: Reply with quote

Hi ink,

Nothing jumps out at me that would indicate why you're seeing a hang. You might try using parallel, vector(16) for your i loop and remove the second parallel clause around the k loop. Though, these should help performance and not cause runtime errors.

Try setting "NVDEBUG=1" in your environment. This will give you *a lot* of information but hopefully help in determining exactly where the hang is. Note that there aren't any runtime license checks.

Hope this helps,
Mat
Back to top
View user's profile
ink



Joined: 25 Nov 2008
Posts: 8

PostPosted: Tue Nov 24, 2009 4:53 am    Post subject: Reply with quote

Mat, many thanks for your help.
it turned out that sitenvrc still needs to be setup manually and i forgot about it. (it is a bit strange that even small incremental updates eg from 9.0-3 to 9.0-4 could not pick it up automatically).

moving on. i'm getting now
gfec: error: unrecognized option `-TARG:abi=n64'

without sitenvrc the code can be compiled but hangs (even if sitenvrc is created after the code was compiled)

with sitenvrc i'm getting the error
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group