Ubuntu – Sddm / plasma have problems with OpenGL after upgrading to 18.04

18.04kubuntunvidiaopengl

I have upgraded my Kubuntu system (desktop workstation with Nvidia GPU) multiple times, and I am using the nvidia binary driver. Recently, after upgrading to 18.04 (bionic), I was facing a black screen with a mouse cursor after booting up. Apparently, I was using sddm, and debugging this I found /var/log/sddm.log contained

GREETER: Could not initialize GLX

I also found the following, more detailed message using journalctl -e -t sddm-greeter:

Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize -1, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::SwapBehavior(DoubleBuffer), swapInterval 1, profile  QSurfaceFormat::OpenGLContextProfile(NoProfile))

I tried uninstalling and reinstalling many things (for instance, nvidia-driver-390 and everything related to nvidia), and eventually switched from sddm to lightdm. Now, I could log in, but KDE would also not properly start; the message is

Plasma is unable to start as it could not correctly use OpenGL 2. Please check that your graphics drivers are set up correctly.

When I manually start plasmashell and krunner, I start getting a usable desktop, but a very unstable KDE session with a frequent flashing & popup

Desktop effects were restarted due to a graphics reset

Question: What may cause these messages, and how should I continue debugging this?

Here are some facts that may be relevant, starting with the more suspicious ones:

  • Probably unrelated: For some reason, I also had severe problems getting nvidia-docker to work again after the upgrade, but could fix that by editing /etc/nvidia-container-runtime/config.toml to adapt to my /dev/nvidia0-owning group.
  • lightdm does not start automatically on boot, but I can do sudo service lightdm restart in order to get a login screen.
  • I have heard that Ubuntu changed from running X on vt7 to vt1, but on my system it is still running on vt7. No text-mode login is running on vt1, though.
  • I also have problems with DBUS; for instance, muon cannot contact an authentication agent via DBUS (dbus daemons seem to be running, though, so maybe the problem are again KDE services).

The following things I checked looked perfectly fine to me:

  • glxgears and some other GL-using programs seem to work fine.
  • glxinfo seems to confirm that I am using the nvidia driver (now, version 410 from the graphics-drivers PPA) successfully, and that my graphics card is recognized.
  • A non-KDE app I tested (MeVisLab) is able to make advanced use of OpenGL and reports OpenGL version 4.6.0 without problems.
  • nvidia-settings also looks normal.
  • /var/log/Xorg.0.log looks normal to me.
  • I can run demanding programs using CUDA and my GPU, both through nvidia-docker and without.
  • I am not using prime; /usr/share/sddm/scripts/Xsetup does run /sbin/prime-offload, which seems to write "Sorry but your hardware configuration is not supported" into /var/log/prime-offload.log, and /var/log/prime-supported.log contains "No offloading required. Abort"

I think the following questions may be referring to the same problem I have, but they're all unresolved and the descriptions did not match perfectly (notebook vs. desktop, for instance). I preferred to start from scratch and to decide after (hopefully) resolving the problem whether they're duplicates:

Best Answer

I have finally found the culprit: The problems were indeed caused by wrong permissions on /dev/nvidia* files! These files belonged to the group vglusers, which I was a member of. However, apparently there are some daemons (like colord, sddm-related, probably more) which were not in that group, and that caused problems. Furthermore, there's no reason these files should not have the default permissions.

However, it was quite hard to find out how to fix that, since chmod/chgrp would apparently work (according to ls -l), but the devices would magically get their permissions changed back when I used them (e.g. when restarting sddm).

At some point in the past, I had virtualgl installed. Uninstalling that (long ago) left two configuration files, namely /etc/udev/rules.d/99-virtualgl-dri.rules which contained

KERNEL=="card[0-9]", MODE="0660", OWNER="root", GROUP="vglusers"

and /etc/modprobe.d/virtualgl.conf contained

options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=1005 NVreg_DeviceFileMode=0660

I removed both files, ran update-initramfs -u in order to let the changed options take effect, and did delgroup vglusers (which was 1005, of course).

I hope this helps other people in the future; I spend (too) many hours debugging this!