Program crashes with wrong ta

OpenACC and CUDA Fortran
Post Reply
Sermus
Posts: 1
Joined: Jan 12 2020

Program crashes with wrong ta

Post by Sermus » Sun Jan 12, 2020 6:00 am

Hello,
I have Geforce RTX 2070 and I'm trying to port some code to GPU with OpenACC. I put some acc pragmas into my code and compiled with -ta=tesla:cc75 which seems to be the appropriate one for my GPU.
The compiler invocation looks like follows:
pgc++ -DHAVE_CONFIG_H -I. -I`echo /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src` -g -fast -acc -ta=tesla:cc75 -Mprof=ccff -Minfo=accel -DCOINUTILS_BUILD -c -o CoinFactorization3.lo /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp

And the output seems to be fine:
CoinFactorization::updateColumnUSparsish(CoinIndexedVector *, int *) const:
1386, Generating copyin(indexIn[:numberNonZero]) [if not already present]
Generating copy(stack[:this+->__b_19CoinArrayWithLength.size_/4]) [if not already present]
Generating present(next[:],mark[:],list[:])
Generating Tesla code
1396, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
1404, Generating implicit reduction(+:nMarked)

However, when I execute, the process crashes with:
Current file: /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp
function: _ZNK17CoinFactorization21updateColumnUSparsishEP17CoinIndexedVectorPi
line: 1386
This file was compiled: -ta=tesla:cc70

I assume it complains the ta=tesla:cc70 doesn't match my GPU architecture, however, i compiled it with cc75 and not cc70. Can you shed some light what this might be?

My OS is Ubuntu 18.04.

PGI version is

pgcc 19.10-0 LLVM 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.


pgaccelinfo is

CUDA Driver Version: 10010
NVRM version: NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019

Device Number: 0
Device Name: GeForce RTX 2070 SUPER
Device Revision Number: 7.5
Global Memory Size: 8366784512
Number of Multiprocessors: 40
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1815 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 7001 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 4194304 bytes
Max Threads Per SMP: 1024
Async Engines: 3
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Multi-Device: Yes
PGI Default Target: -ta=tesla:cc75

CUDA version is 10.1

mkcolg
Posts: 8255
Joined: Jun 30 2004

Re: Program crashes with wrong ta

Post by mkcolg » Mon Jan 13, 2020 11:51 am

Hi Sermus,

Unfortunately, I'm not sure what's wrong here. Adding "-ta=tesla:cc75" should be correct so it's unclear why cc70 code is getting generated. I tried reproducing the error here on a Tesla T4 (also CC75) but didn't see any issues. Not that we officially only support Tesla products, but typically other NVIDIA devices will work as well if they use the same CC as a Tesla product.

Are you able provide a reproducing example that I can use to try and recreate the error?

If not, can you add the flag "-v" (verbose) to your compilation and post the output so I can see what device code is being generated?

Also, just checking that you are using PGI to link and using "-ta=tesla:cc75" on the link line? If you're not using PGI to link, you may need to add "-ta=tesla:cc75,nordc" to your compilation. RDC requires the code to be linked with a device linker which wont be done if you're using a different compiler.

-Mat

Post Reply