| View previous topic :: View next topic |
| Author |
Message |
ink
Joined: 25 Nov 2008 Posts: 8
|
Posted: Mon Nov 23, 2009 9:39 am Post subject: run time problems with 10.0 |
|
|
Hello,
I have a simple code which runs fine when compiled with 9.0-4 but either runs much slower (as non accelerated) or does not run at all (just hangs) if compiled with 10.0.
Any thoughts?
thanks |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Nov 23, 2009 10:39 am Post subject: |
|
|
Hi ink,
A lot of changes went into 10.0, so it could be any of a number of things. You're welcome to send the code into PGI Customer Support (trs@pgroup.com) and we can take a look.
Otherwise, I would start with the informational messages (-Minfo=accel) to see what's changed. Perhaps the schedule selected by the compiler is no longer optimal and you need to use the "parallel" and "vector" clauses? Maybe the compiler is no longer caching a variable?
- Mat |
|
| Back to top |
|
 |
ink
Joined: 25 Nov 2008 Posts: 8
|
Posted: Mon Nov 23, 2009 12:05 pm Post subject: |
|
|
here is the code
10 #pragma acc region for parallel
11 for( i = 0 ; i < m; i++ ){
12 #pragma acc for parallel
13 for( k = 0; k < n; k++ ) {
14 #pragma acc for seq
15 for( j = 0; j < l; j++ ){
16 c[i][k] = c[i][k] + a[i][j]*b[j][k];
17 }
18 }
19 }
20 }
which is compiled with
pgcc -ta=nvidia:cc13 -Minfo -fast -Msafeptr=all -c
mxm:
10, Generating copyin(b[0:l-1][0:n-1])
Generating copyin(a[0:m-1][0:l-1])
Generating copy(c[0:m-1][0:n-1])
11, Loop is parallelizable
Accelerator kernel generated
11, #pragma acc for parallel
13, Loop is parallelizable
15, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization
I think the kernel is generated. What I don't understand is why it hangs like it can't allocate a device or is checking a license. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Nov 23, 2009 1:30 pm Post subject: |
|
|
Hi ink,
Nothing jumps out at me that would indicate why you're seeing a hang. You might try using parallel, vector(16) for your i loop and remove the second parallel clause around the k loop. Though, these should help performance and not cause runtime errors.
Try setting "NVDEBUG=1" in your environment. This will give you *a lot* of information but hopefully help in determining exactly where the hang is. Note that there aren't any runtime license checks.
Hope this helps,
Mat |
|
| Back to top |
|
 |
ink
Joined: 25 Nov 2008 Posts: 8
|
Posted: Tue Nov 24, 2009 4:53 am Post subject: |
|
|
Mat, many thanks for your help.
it turned out that sitenvrc still needs to be setup manually and i forgot about it. (it is a bit strange that even small incremental updates eg from 9.0-3 to 9.0-4 could not pick it up automatically).
moving on. i'm getting now
gfec: error: unrecognized option `-TARG:abi=n64'
without sitenvrc the code can be compiled but hangs (even if sitenvrc is created after the code was compiled)
with sitenvrc i'm getting the error |
|
| Back to top |
|
 |
|