GPU usage per process on a Linux machine (CUDA)

gpumonitoring

I use the CUDA toolkit to perform some computations on my Nvidia GPUs. How can I see the per-process GPU usage on a Linux machine (CUDA)?

nvidia-smi does list all processes for each GPU, but doesn't indicate the GPU utilization per process:

I stumbled upon the following:

watch -n 0.5 nvidia-smi pmon -c 1

Though it seems to have a different measure of GPU usage than the volatility metric in the usual nvidia-smi.

Full options in the manual.