Ubuntu – Domain resolution (systemd-resolved) is messed up, how can it be fixed

17.04dnsnetwork-managernetworkingsystemd-resolved

Sorry this is a long one, the TL;DR is that domain resolution (and maybe other things) works only intermittently and thus internet only works intermittently. I'd like to fix it: Kubuntu 17.04.

There are several symptoms: on 20170605 I put computer on to connect a remote device for using plex but the connection from remote device on same subnet to local computer using wifi via an ath9k_htc TP-LINK usb dongle was intermittent. I had run recent updates (pastebin) but they seem unrelated to DNS resolution.

Pinging, using mtr with 2s interval, Google's DNS at 8.8.8.8 I get:

 1. 192.168.1.1              ...............................????????????????...............????????????????...........................................................................??????????.....................???????.........
 2. 81.1.112.44              ...............................????????????????...............????????????????.........>.................................................................?????????......................???????.........

The questions show when the connection fails, the intervals are 32s, 32s, 22s, 14s, ie irregular.

Initially I thought that systemd-resolved was to blame, sudo systemctl status wpa_supplicant.service NetworkManager.service systemd-resolved returns the following:

thisuser@host-k1210:~$ sudo systemctl status wpa_supplicant.service NetworkManager.service systemd-resolved.service 
● wpa_supplicant.service - WPA supplicant
   Loaded: loaded (/lib/systemd/system/wpa_supplicant.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-06-06 09:26:04 BST; 3h 52min ago
 Main PID: 1252 (wpa_supplicant)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/wpa_supplicant.service
           └─1252 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant

Jun 06 11:35:52 host-k1210 wpa_supplicant[1252]: wlan15: WPA: Group rekeying completed with 00:1c:df:9b:8d:ff [GTK=TKIP]
Jun 06 11:57:25 host-k1210 wpa_supplicant[1252]: wlan15: CTRL-EVENT-DISCONNECTED bssid=00:1c:df:9b:8d:ff reason=3 locally_generated=1
Jun 06 11:57:25 host-k1210 wpa_supplicant[1252]: wlan15: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: SME: Trying to authenticate with 00:1c:df:9b:8d:ff (SSID='TALKTALK-17A908' fr
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: Trying to associate with 00:1c:df:9b:8d:ff (SSID='TALKTALK-17A908' freq=2412 
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: Associated with 00:1c:df:9b:8d:ff
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=US
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: WPA: Key negotiation completed with 00:1c:df:9b:8d:ff [PTK=CCMP GTK=TKIP]
Jun 06 11:57:28 host-k1210 wpa_supplicant[1252]: wlan15: CTRL-EVENT-CONNECTED - Connection to 00:1c:df:9b:8d:ff completed [id=0 id_str
Jun 06 12:36:45 host-k1210 wpa_supplicant[1252]: wlan15: WPA: Group rekeying completed with 00:1c:df:9b:8d:ff [GTK=TKIP]

● NetworkManager.service - Network Manager
   Loaded: loaded (/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-06-06 11:05:45 BST; 2h 13min ago
     Docs: man:NetworkManager(8)
 Main PID: 3815 (NetworkManager)
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/NetworkManager.service
           └─3815 /usr/sbin/NetworkManager --no-daemon

Jun 06 11:05:45 host-k1210 systemd[1]: Starting Network Manager...
Jun 06 11:05:45 host-k1210 systemd[1]: Started Network Manager.
Jun 06 11:05:45 host-k1210 NetworkManager[3815]: <warn>  [1496743545.4801] keyfile: 'hostname' option is deprecated and has no effect
Jun 06 11:05:45 host-k1210 NetworkManager[3815]: ((devices/nm-device.c:970)): assertion '<dropped>' failed
Jun 06 11:57:25 host-k1210 NetworkManager[3815]: <warn>  [1496746645.8678] sup-iface[0x556a4c929950,wlan15]: connection disconnected (

● systemd-resolved.service - Network Name Resolution
   Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/systemd-resolved.service.d
           └─resolvconf.conf
   Active: active (running) since Tue 2017-06-06 09:26:00 BST; 3h 52min ago
     Docs: man:systemd-resolved.service(8)
           http://www.freedesktop.org/wiki/Software/systemd/resolved
           http://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
           http://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
 Main PID: 1192 (systemd-resolve)
   Status: "Processing requests..."
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/systemd-resolved.service
           └─1192 /lib/systemd/systemd-resolved

Jun 06 13:14:01 host-k1210 systemd-resolved[1192]: Switching to DNS server 8.8.8.8 for interface wlan15.
Jun 06 13:14:01 host-k1210 systemd-resolved[1192]: Using degraded feature set (UDP+EDNS0+DO) for DNS server 8.8.8.8.
Jun 06 13:14:05 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun 06 13:14:05 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.220.220 for interface wlan15.
Jun 06 13:14:06 host-k1210 systemd-resolved[1192]: Switching to DNS server 8.8.8.8 for interface wlan15.
Jun 06 13:17:44 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun 06 13:17:49 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.220.220 for interface wlan15.
Jun 06 13:17:55 host-k1210 systemd-resolved[1192]: Switching to DNS server 8.8.8.8 for interface wlan15.
Jun 06 13:18:00 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun 06 13:18:05 host-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.220.220 for interface wlan15.

In other words at least 3 systems are in error states:

1) This is showing the error in this post about wpa_supplicant/NetworkManager; the solution of removing randomisation of the MAC address by modifying NetworkManager.conf is ineffective for me.

2a) the error in this post about rekeying in wpa_supplicant; again the fix doesn't help me as I don't have a router on which I can "Set WPA/WPA2 Group Key Update Period".

2b) the same error is report here with instruction to alter the PMF (Protected Management Frames), again my wifi router doesn't offer access to this.

3) the third error can also be seen by tail-ing /var/log/syslog:

Jun  6 13:18:00 bridgeflap-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun  6 13:18:05 bridgeflap-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.220.220 for interface wlan15.
Jun  6 13:24:07 bridgeflap-k1210 systemd-resolved[1192]: Switching to DNS server 8.8.8.8 for interface wlan15.
Jun  6 13:24:12 bridgeflap-k1210 systemd-resolved[1192]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun  6 13:24:12 bridgeflap-k1210 systemd-resolved[1192]: Grace period over, resuming full feature set (UDP+EDNS0+DO+LARGE) for DNS server 208.67.222.222.

These errors get repeated a lot. When the system is connected to the net they stop, then when mtr 8.8.8.8 shows pings are failing these above errors start up again, pretty much like that – almost like a race condition.

3a) the errors in systemd-resolved look a lot like this issue but the fix of turning off DNSSEC doesn't work for me, indeed that was the default, though I went ahead and specified it as off in systemd-resolved.conf.

3b) it looks a lot like this pre-release Valet Linux problem where dnsmasq was seemingly interferring with systemd's resolver. I have had dnsmasq in the past but don't currently have it.

3c) a forum thread from 16.10 suggests removing dnsmasq as the solution, to recapitulate I don't have it installed (but had dnsmasq-base as a residual, removing it was ineffective).

FWIW I use a static IP connecting via wifi to an ADSL router-modem with OpenDNS (with Google as fallback) set via the KDE NetworkManager interface.

Pretty much restarting any network item, eg sudo systemctl restart networking.service appears to very briefly fix things but the transience of the error makes it hard to tell – the connection is literally going up and down all the time, shortest down is ~2s, longest up is ~60s.

Doing journalctl -x --utc --system | grep -C3 -i error I get lines like:

-- Unit NetworkManager-wait-online.service has begun starting up.
Jun 06 09:57:29 bridgeflap-k1210 NetworkManager[3236]: <warn>  [1496743049.2492] keyfile: 'hostname' option is deprecated and has no effect
Jun 06 09:57:29 bridgeflap-k1210 NetworkManager[3236]: <warn>  [1496743049.2950] keyfile: error loading connection from file /etc/NetworkManager/system-connections/TALKTALK-E8D140-50a6bcfd-4d2e-4ec2-9a43-38d3d1cd21b2: invalid connection: connection.type: property is missing

Removing the connection via KDE's NetworkManager applet appears to have fixed this "property is missing" error, and temporarily I got the network connection back but on rebooting I'm back to the same [apparently] DNS resolver problems and that error is no longer appearing in the journal.

Seems I'm left to try sudo systemctl disable systemd-resolved and going back to dnsmasq or, I gather, using unbound (this solution suggests using resolvconf), or maybe setting static nameservers?

So, what to try next?

To anticipate the suggestion my /etc/resolv.conf is already a symlink to /run/systemd/… actually hang on … no it doesn't …

sudo apt remove resolvconf
sudo mv /etc/resolv.conf{,.20170606a}
sudo ln -s /run/resolvconf/resolv.conf /etc/resolv.conf
sudo dpkg-reconfigure systemd
sudo systemctl disable resolvconf.service
sudo systemctl restart systemd-resolved.service networking.service

Looked like it worked, I'm now getting longer stretches ~200s where mtr pings successfully, but still dropping and still with errors is syslog like:

Jun  6 14:50:41 bridgeflap-k1210 systemd-resolved[25306]: Switching to DNS server 208.67.222.222 for interface wlan15.
Jun  6 14:50:43 bridgeflap-k1210 systemd-resolved[25306]: Switching to DNS server 208.67.220.220 for interface wlan15.
Jun  6 14:50:43 bridgeflap-k1210 systemd-resolved[25306]: Using degraded feature set (UDP+EDNS0) for DNS server 208.67.220.220.
Jun  6 14:50:44 bridgeflap-k1210 systemd-resolved[25306]: Switching to DNS server 8.8.8.8 for interface wlan15.

Help!?

Best Answer

I used Wireshark to monitor the connection and filtered on "DNS" - I could see lookups for domains I wasn't looking at. Thankfully I recognised one as a domain for an app on my FireTV dongle (2017 version). The whole issue was that FireTV had "stolen" the lease via the DHCP on my router for the desktop that was having issues; presumably it was getting half the packets, or the router was getting confused by having 2 devices on the same local subnet with the same IP address.

The fix was to "forget" the network connection on the FireTV. When setting up the connection choose "manual" and between the OK and Cancel buttons there's a small "advanced button" (that I missed the first time). I chose a differing IP, now it all works.

Of note is that the router has a section to view all connected devices by IP & MAC, when the two devices were connecting with the same IP the router was showing neither device (but continued showing other devices on the net).

[Not sure whether to keep this here or delete and post somewhere else?]

Related Question