|
| View previous topic :: View next topic |
| Author |
Message |
njustn
Joined: 09 Nov 2011 Posts: 18
|
Posted: Mon Feb 04, 2013 11:42 am Post subject: PGI 13.1 breaks on acc regions inside parallel regions |
|
|
Hi, this is a new version of an issue that has been around for a while. Classically the applications would fail at runtime unless every acc region inside an omp parallel region were surrounded by an acc data region. Now with 13.1 I can't seem to get any acc regions to work inside omp parallel regions at all. For example, look at the following code. (apologies for the length but this is the simplest way to put it through)
| Code: |
#include <stdio.h>
#include <unistd.h>
#include <omp.h>
#include <openacc.h>
/* #include <cuda.h> */
#include <sys/time.h>
#include <sys/types.h>
#define SIZE 1000
void works(){
int data[15][SIZE]={0};
for(int j=0; j<4; j++)
{
int * stuff = data[j];
if(j < acc_get_num_devices(acc_device_nvidia)){
acc_set_device_num(j, acc_device_nvidia);
fprintf(stderr,"prelaunch: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
int i;
#pragma acc data copyout(stuff[0:SIZE])
{
fprintf(stderr,"data environment initialized: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
#pragma acc kernels
for(i = 0; i<SIZE; i++)
{
stuff[i] = 1;
}
fprintf(stderr,"ACC region complete: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
}
printf("device %d done, checking results\n", j);
for(i = 0; i<SIZE; i++)
{
if(stuff[i] != 1){
printf("fail after: %d\n", i);
exit(1);
}
}
}
}
}
void dies(){
int data[15][SIZE]={0};
#pragma omp parallel
{
int j = omp_get_thread_num();
int * stuff = data[j];
if(j < acc_get_num_devices(acc_device_nvidia)){
acc_set_device_num(j, acc_device_nvidia);
fprintf(stderr,"prelaunch: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
int i;
#pragma acc data copyout(stuff[0:SIZE])
{
fprintf(stderr,"data environment initialized: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
#pragma acc kernels
for(i = 0; i<SIZE; i++)
{
stuff[i] = 1;
}
fprintf(stderr,"ACC region complete: in thread %d, testing device %d\n", j, acc_get_device_num(acc_device_nvidia));
}
printf("device %d done, checking results\n", j);
for(i = 0; i<SIZE; i++)
{
if(stuff[i] != 1){
printf("fail after: %d\n", i);
exit(1);
}
}
}
}
}
int main(int argc, char * argv[])
{
int tid;
if(argc > 1){
dies();
}else{
works();
}
return 0;
}
|
If this is compiled with the following command, both branches work (testable with ./test and ./test 1).
| Code: |
/opt/pgi/linux86-64/2012/bin/pgcc -Minfo=accel,mp -mp=allcores -O3 -g -ta=nvidia:cuda4.1,keepgpu,keepptx -acc
|
Output looks like this.
Works branch:
| Code: |
prelaunch: in thread 0, testing device 0
data environment initialized: in thread 0, testing device 0
ACC region complete: in thread 0, testing device 0
device 0 done, checking results
prelaunch: in thread 1, testing device 1
data environment initialized: in thread 1, testing device 1
ACC region complete: in thread 1, testing device 1
device 1 done, checking results
prelaunch: in thread 2, testing device 2
data environment initialized: in thread 2, testing device 2
ACC region complete: in thread 2, testing device 2
device 2 done, checking results
prelaunch: in thread 3, testing device 3
data environment initialized: in thread 3, testing device 3
ACC region complete: in thread 3, testing device 3
device 3 done, checking results
|
Dies branch:
| Code: |
prelaunch: in thread 3, testing device 3
prelaunch: in thread 0, testing device 0
prelaunch: in thread 2, testing device 2
prelaunch: in thread 1, testing device 1
data environment initialized: in thread 0, testing device 0
ACC region complete: in thread 0, testing device 0
device 0 done, checking results
data environment initialized: in thread 1, testing device 1
data environment initialized: in thread 3, testing device 3
ACC region complete: in thread 1, testing device 1
ACC region complete: in thread 3, testing device 3
data environment initialized: in thread 2, testing device 2
ACC region complete: in thread 2, testing device 2
device 1 done, checking results
device 3 done, checking results
device 2 done, checking results
|
On the other hand compiled with 13.1 as with the following line, it dies on the "dies" branch.
| Code: |
pgcc -Minfo=accel,mp -mp=allcores -O3 -g -ta=nvidia:cuda5.0,keepgpu,keepptx -acc
|
Works branch:
| Code: |
prelaunch: in thread 0, testing device 0
data environment initialized: in thread 0, testing device 0
ACC region complete: in thread 0, testing device 0
device 0 done, checking results
prelaunch: in thread 1, testing device 1
data environment initialized: in thread 1, testing device 1
ACC region complete: in thread 1, testing device 1
device 1 done, checking results
prelaunch: in thread 2, testing device 2
data environment initialized: in thread 2, testing device 2
ACC region complete: in thread 2, testing device 2
device 2 done, checking results
prelaunch: in thread 3, testing device 3
data environment initialized: in thread 3, testing device 3
ACC region complete: in thread 3, testing device 3
device 3 done, checking results
|
Dies branch:
| Code: |
prelaunch: in thread 0, testing device 0
prelaunch: in thread 3, testing device 3
prelaunch: in thread 2, testing device 2
prelaunch: in thread 1, testing device 1
data environment initialized: in thread 1, testing device 1
data environment initialized: in thread 0, testing device 0
data environment initialized: in thread 2, testing device 2
data environment initialized: in thread 3, testing device 3
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuStreamSynchronize returned error 4: Deinitialized
|
Has anyone else run into this? I've been trying to fix it for a few days now with no success, and have run out of ideas to try. Since the compile lines do specify cuda versions, I did test this with different cuda versions with the same result. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Feb 04, 2013 2:17 pm Post subject: |
|
|
Hi njustn,
Thanks for the report. This does look like problem with 13.1. The multi-device support was completely updated in order to support multiple device types (i..e. NVIDIA, AMD, Intel, etc). Unfortunately, it appears that there are few issues to be worked out.
I submitted a problem report (TPR#19102) to our engineers. The good news is your code works with our internal development compiler meaning that our engineers have already found and fix this issue. I'm not sure on it's status, but hopefully this means we can have the fix in the 13.2 compilers due out here shortly.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|