PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

cudaFuncSetCacheConfig and CUDA 5.0
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Mon Apr 22, 2013 7:18 am    Post subject: cudaFuncSetCacheConfig and CUDA 5.0 Reply with quote

All,

I seem to be encountering an issue with CUDA 5 that I didn't with CUDA 4.2, and it seems to involve the use of cudaFuncSetCacheConfig.

Namely, my CUDA Fortran code runs just swimmingly if I compile -Mcuda=4.2,cc20 (which is my default). However, I just decided to try it out with -Mcuda=5.0,cc20 and, boom:
Code:
[janus:26230] *** Process received signal ***
[janus:26230] Signal: Segmentation fault (11)
[janus:26230] Signal code: Address not mapped (1)
[janus:26230] Failing at address: (nil)
[janus:26230] [ 0] /lib64/libpthread.so.0() [0x3d4440f500]
[janus:26230] *** End of error message ***

This being CUDA Fortran, I "debugged" (aka added some print statements) and was able to track down the crash to a cudaFuncSetCacheConfig call made just before the first GPU kernel call. And, indeed, commenting out this call allowed the program to proceed...until the next cudaFuncSetCacheConfig call, where it crashed.

I decided to whip up a tester (which is just the .../13.4/etc/samples/cudafor/sgemm.cuf code with a FuncSet call) where the important bits are:

Code:
  call sgemm_cpu(A, B, gold, m, N, k, alpha, beta)

  ! timing experiment
  time = 0.0

  istat = cudaFuncSetCacheConfig('saxpy_sgemm_sgemmnn_16x16',cudaFuncCachePreferL1)

  istat = cudaGetLastError()
  if (istat /= 0) then
     write (*,*) "Error code from cache set call: ", istat
     write (*,*) "Kernel call failed: ", cudaGetErrorString(istat)
  end if

  istat = cudaEventRecord(start, 0)
  do j = 1, NREPS
    call sgemmNN_16x16<<<blocks, threads>>>(dA, dB, dC, m, N, k, alpha, beta)
  end do
  istat = cudaEventRecord(stop, 0)
  istat = cudaThreadSynchronize()


I then did a couple experiments:

Code:
(387) $ pgfortran -V

pgfortran 13.4-0 64-bit target on x86-64 Linux -tp nehalem
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2013, STMicroelectronics, Inc.  All Rights Reserved.
(388) $ pgfortran -Mcuda=4.2,cc20,ptxinfo -Minfo sgemm-funccache.cuf -o cuda42.exe
ptxas info    : Compiling entry function 'saxpy_sgemm_sgemmnn_16x16_' for 'sm_20'
ptxas info    : Function properties for saxpy_sgemm_sgemmnn_16x16_
    64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 20 registers, 1088+0 bytes smem, 76 bytes cmem[0]
(389) $ pgfortran -Mcuda=5.0,cc20,ptxinfo -Minfo sgemm-funccache.cuf -o cuda50.exe
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'saxpy_sgemm_sgemmnn_16x16_' for 'sm_20'
ptxas info    : Function properties for saxpy_sgemm_sgemmnn_16x16_
    64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 19 registers, 1088 bytes smem, 76 bytes cmem[0]
(390) $ ./cuda42.exe

Device:Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory.

 Test passed!
256x256 * 256x256:      0.414 ms     81.004 GFlops/s
(391) $ ./cuda50.exe

Device:Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory.

 Error code from cache set call:             8
 Kernel call failed:
 invalid device function                                                                                                         
 Test passed!
256x256 * 256x256:      1.578 ms     21.267 GFlops/s

Thus, it looks like the CUDA 5 version has a different behavior. In the case of my tester it doesn't crash out with a Segfault, but it's still different than CUDA 4.2 in that it triggers an error (and is slower...for some reason).

So, my question is: is this expected? Was cudaFuncSetCacheConfig deprecated? Or, perhaps, is my driver to old? I'm using 304.60.

(And, I suppose, I'm not sure why the CUDA 5 version is slower...even without the cudaFuncSetCacheConfig call; I tested that. Ideas?)

Thanks,
Matt
Back to top
View user's profile
brentl



Joined: 20 Jul 2004
Posts: 132

PostPosted: Tue Apr 23, 2013 10:42 am    Post subject: Reply with quote

NVIDIA changed the API from CUDA 4.2 to CUDA 5.0 for cudaFuncSetCacheConfig.

In CUDA 5.0, try just using the subroutine name.

istat = cudaFuncSetCacheConfig(sgemmnn_16x16, cudaFuncCachePreferL1)

Sorry, we try to hide users from things changing out from underneath them, but in this case, there was nothing we could do.

As for performance, you'll be a lot happier if you turn on optimization.

brentl@sb-leback:~/simple/cuda> pgf90 -Mcuda=4.2 sgemm.cuf
brentl@sb-leback:~/simple/cuda> ./a.out

Test PASSED!
256x256 * 256x256: 0.841 ms 39.884 GFlops/s
brentl@sb-leback:~/simple/cuda> pgf90 -Mcuda=5.0 sgemm.cuf
brentl@sb-leback:~/simple/cuda> ./a.out

Test PASSED!
256x256 * 256x256: 1.873 ms 17.913 GFlops/s
brentl@sb-leback:~/simple/cuda> pgf90 -Mcuda=4.2 -fast sgemm.cuf
brentl@sb-leback:~/simple/cuda> ./a.out

Test PASSED!
256x256 * 256x256: 0.171 ms 196.531 GFlops/s
brentl@sb-leback:~/simple/cuda> pgf90 -Mcuda=5.0 -fast sgemm.cuf
brentl@sb-leback:~/simple/cuda> ./a.out

Test PASSED!
256x256 * 256x256: 0.176 ms 191.071 GFlops/s
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Tue Apr 23, 2013 11:27 am    Post subject: Reply with quote

Brent,

It still didn't seem to help me with the tester on my box:

Code:
(551) $ grep FuncSet sgemm-funccache-for50.cuf
  istat = cudaFuncSetCacheConfig('sgemmnn_16x16',cudaFuncCachePreferL1)
(552) $ pgfortran -fast -Mcuda=5.0,cc20,ptxinfo sgemm-funccache-for50.cuf -o cuda50.exe
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'saxpy_sgemm_sgemmnn_16x16_' for 'sm_20'
ptxas info    : Function properties for saxpy_sgemm_sgemmnn_16x16_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 51 registers, 1088 bytes smem, 76 bytes cmem[0]
(553) $ ./cuda50.exe

Device:Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory.

 Error code from cache set call:             8
 Kernel call failed:
 invalid device function                                                                                                         
 Test passed!
256x256 * 256x256:      0.156 ms    215.302 GFlops/s

At least, the error call still returns a non-zero istat:
Code:
(559) $ diff sgemm-funccache-for50.cuf /opt/pgi/linux86-64/13.4/etc/samples/cudafor/sgemm.cuf
148,156d147
<
<   istat = cudaFuncSetCacheConfig('sgemmnn_16x16',cudaFuncCachePreferL1)
<
<   istat = cudaGetLastError()
<   if (istat /= 0) then
<      write (*,*) "Error code from cache set call: ", istat
<      write (*,*) "Kernel call failed: ", cudaGetErrorString(istat)
<   end if
<
Back to top
View user's profile
MuellerM



Joined: 04 Apr 2013
Posts: 49

PostPosted: Tue Apr 23, 2013 8:16 pm    Post subject: Reply with quote

Hey Matt, just a general input on debugging with CUDA Fortran: I've lately had some good experience starting with cuda-memcheck on programs that fail with a CUDA error, before going into lengthy debugging with write statements. It may not help here, but it can easily save you a half an hour here and there in the future. The reason is that uninitialized memory on CUDA often seems to lead to a corruption of your state such that strange errors occur at places where you wouldn't expect it - cuda-memcheck will find those uninitialized places for you.
Back to top
View user's profile
brentl



Joined: 20 Jul 2004
Posts: 132

PostPosted: Wed Apr 24, 2013 8:33 am    Post subject: Reply with quote

Take the name out of quotes. The new API takes a function pointer, not a character string
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group