I'm working on an Ubuntu 16.04 and I have a little bit old Nvidia 9600 GT graphics card. It's CUDA enabled (1.1 computation capability) though legacy. I'm trying to take advantage of it while using Keras and to do so I followed this guide to install CUDA and this one to install cuDNN. The driver for my graphics card is the 304.104 version and the last CUDA version that supports my graphics card is 6.5. After installation I verified it by typing in the console:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

$ nvidia-smi
Fri Dec 22 23:02:08 2017       
| NVIDIA-SMI 340.104    Driver Version: 340.104        |                       
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce 9600 GT     Off  | 0000:01:00.0     N/A |                  N/A |
| 40%   46C    P0    N/A /  N/A |     77MiB /  1023MiB |     N/A      Default |

| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|    0            Not Supported                                               |

Also the compilation of the deviceQuery sample succedeed:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 9600 GT"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 1024 MBytes (1073414144 bytes)
  ( 8) Multiprocessors, (  8) CUDA Cores/MP:     64 CUDA Cores
  GPU Clock rate:                                1625 MHz (1.62 GHz)
  Memory Clock rate:                             400 Mhz
  Memory Bus Width:                              256-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No                                                                                                    
  Device PCI Bus ID / PCI location ID:           1 / 0                                                                                                 
  Compute Mode:                                                                                                                                        
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >                                                          

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 9600 GT                       
Result = PASS

Then I followed this recommended installation method and this documentation to install Theano as it's capable of operating on my graphics card in comparison to TensorFlow. I created .theanorc file

device = cuda0
floatX = float32


include_path = /usr/local/cuda-6.5/include/
library_path = /usr/local/cuda-6.5/lib64/

I also exported proper variables to .profile:

export PATH=/usr/local/cuda-6.5/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

However when I try to run this simple test script:

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
    print('Used the gpu')

I receive the following error:

ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/home/kuba/anaconda3/lib/python3.6/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
  File "/home/kuba/anaconda3/lib/python3.6/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/home/kuba/anaconda3/lib/python3.6/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
  File "pygpu/gpuarray.pyx", line 651, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
pygpu.gpuarray.GpuArrayException: b'Could not find symbol "cuDevicePrimaryCtxGetState": /usr/lib/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxGetState'

I don't understand this error because in the Nvidia's documentation this function crearly exists. Does anyone have any clue? Can the problem be that I'm using python 3.6 while in the aforementioned documentation there's < sign before 3.6? Is any o the paths wrong?

Best Answer

Generally speaking, all those components require minimum CUDA 7 and cuDNN 3 or newer which in turn require GPU of CUDA computing capability at least 2.0.