|
| View previous topic :: View next topic |
| Author |
Message |
alechand
Joined: 14 May 2012 Posts: 21
|
Posted: Tue May 07, 2013 9:59 pm Post subject: error for a simple OPENACC program |
|
|
Hello.
I am testing the simple program, modified for OPENACC, called "picalc" from NVIDIA website :
############################
program picalc
implicit none
integer, parameter :: n=1000000
integer :: i
real(kind=8) :: t, pi
pi = 0.0
!$acc parallel loop
do i=0, n-1
t = (i+0.5)/n
pi = pi + 4.0/(1.0 + t*t)
end do
!$acc end parallel loop
print *, 'pi=', pi/n
end program picalc
############################
This program is simple, but my system gives this error:
############################
alechand@pcsantos2:~/gravity$ pgfortran -fast -Minfo=all -o TEST picalc.f90 -ta=nvidia
picalc:
7, Accelerator kernel generated
7, CC 1.3 : 24 registers; 32 shared, 36 constant, 0 local memory bytes
CC 2.0 : 23 registers; 0 shared, 52 constant, 0 local memory bytes
8, !$acc loop gang, vector(256) ! blockidx%x threadidx%x
10, Sum reduction generated for pi
7, Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
alechand@pcsantos2:~/gravity$ ./TEST
call to cuMemcpyDtoH returned error 700: Launch failed
CUDA driver version: 5050
############################
I have this problem also in other codes.
I installed the PGI compiler 12.10,
and i am using kubuntu 12.04.
Can you help me ?
thanks |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Wed May 08, 2013 9:06 am Post subject: |
|
|
Hi alechand,
There's something going on with your device or driver. Can you post the output to the command "pgaccelinfo"? Also, are you able to run a simple CUDA program?
Thanks,
Mat |
|
| Back to top |
|
 |
alechand
Joined: 14 May 2012 Posts: 21
|
Posted: Wed May 08, 2013 9:09 am Post subject: |
|
|
thanks for the reply.
Here is the output :
#####################################
alechand@pcsantos2:~$ pgaccelinfo
CUDA Driver Version: 5050
NVRM version: NVIDIA UNIX x86 Kernel Module 319.17 Thu Apr 25 22:14:10 PDT 2013
Device Number: 0
Device Name: GeForce GTX 680
Device Revision Number: 3.0
Global Memory Size: 2147155968
Number of Multiprocessors: 8
Number of SP Cores: 1536
Number of DP Cores: 512
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1058 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 3004 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 2048
Async Engines: 1
Unified Addressing: No
Initialization time: 314151 microseconds
Current free memory: 2095439872
Upload time (4MB): 994 microseconds ( 843 ms pinned)
Download time: 1733 microseconds ( 759 ms pinned)
Upload bandwidth: 4219 MB/sec (4975 MB/sec pinned)
Download bandwidth: 2420 MB/sec (5526 MB/sec pinned)
###################################
what do you mean a cuda program ?
can you give me an example ?
thanks |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Wed May 08, 2013 9:26 am Post subject: |
|
|
The pgaccelinfo output all looks fine. My test system is a GTX690 so very similar to yours. The only difference is that your driver is newer. I'll see if I can update my driver to see if that's causing the problem.
Can you now try setting the environment variable "PGI_ACC_DEBUG=1" and run your program again?
| Quote: |
what do you mean a cuda program ?
can you give me an example ? | Assuming you have NVIDIA's CUDA SDK installed, you can run one of the Sample programs that come with it. For example:
| Code: | samples/0_Simple/matrixMul% make
/opt/cuda-5.0/bin/nvcc -m64 -gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/opt/cuda-5.0/include -I. -I.. -I../../common/inc -o matrixMul.o -c matrixMul.cu
g++ -m64 -o matrixMul matrixMul.o -L/opt/cuda-5.0/lib64 -lcudart
mkdir -p ../../bin/linux/release
cp matrixMul ../../bin/linux/release
samples/0_Simple/matrixMul% cd ../../bin/linux/release/
samples/bin/linux/release% ls
matrixMul*
samples/bin/linux/release% matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 690" with compute capability 3.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 224.60 GFlop/s, Time= 0.584 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: OK
|
- Mat |
|
| Back to top |
|
 |
alechand
Joined: 14 May 2012 Posts: 21
|
Posted: Wed May 08, 2013 9:47 am Post subject: |
|
|
i used
PGI_ACC_DEBUG=1
but the behaviour was the same as before.
I installed the cuda driver from pgi compiler,
can you help me to find this sample ? |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|