Situation:
- Linux machine running in Azure
- looking for a public domain that returns 112 results
- the packet response size is 1905 bytes
Case 1:
- interrogating google DNS 8.8.8.8 – it returns un-truncated response. Everything is OK.
Case 2:
- interrogating Azure DNS 168.63.129.16 – it returns a truncated response and tries to switch to TCP, but it fails there, with error "unable to connect to server address". However, it works perfectly well if I run the interrogation with "sudo".
The problem can be reproduced all the time:
-
Without sudo:
$ dig aerserv-bc-us-east.bidswitch.net @8.8.8.8 ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> aerserv-bc-us-east.bidswitch.net @8.8.8.8 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49847 ;; flags: qr rd ra; QUERY: 1, ANSWER: 112, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;aerserv-bc-us-east.bidswitch.net. IN A ;; ANSWER SECTION: aerserv-bc-us-east.bidswitch.net. 119 IN CNAME bidcast-bcserver-gce-sc.bidswitch.net. bidcast-bcserver-gce-sc.bidswitch.net. 119 IN CNAME bidcast-bcserver-gce-sc-multifo.bidswitch.net. bidcast-bcserver-gce-sc-multifo.bidswitch.net. 59 IN A 35.211.189.137 bidcast-bcserver-gce-sc-multifo.bidswitch.net. 59 IN A 35.211.205.98 -------- bidcast-bcserver-gce-sc-multifo.bidswitch.net. 59 IN A 35.211.28.65 bidcast-bcserver-gce-sc-multifo.bidswitch.net. 59 IN A 35.211.213.32 ;; Query time: 12 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Thu Oct 03 22:28:09 EEST 2019 ;; MSG SIZE rcvd: 1905 [azureuser@testserver~]$ dig aerserv-bc-us-east.bidswitch.net ;; Truncated, retrying in TCP mode. ;; Connection to 168.63.129.16#53(168.63.129.16) for aerserv-bc-us-east.bidswitc h.net failed: timed out. ;; Connection to 168.63.129.16#53(168.63.129.16) for aerserv-bc-us-east.bidswitch.net failed: timed out. ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> aerserv-bc-us-east.bidswitch.net ;; global options: +cmd ;; connection timed out; no servers could be reached ;; Connection to 168.63.129.16#53(168.63.129.16) for aerserv-bc-us-east.bidswitch.net failed: timed out.
-
With sudo:
[root@testserver ~]# dig aerserv-bc-us-east.bidswitch.net ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> aerserv-bc-us-east.bidswitch.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8941 ;; flags: qr rd ra; QUERY: 1, ANSWER: 112, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1280 ;; QUESTION SECTION: ;aerserv-bc-us-east.bidswitch.net. IN A ;; ANSWER SECTION: aerserv-bc-us-east.bidswitch.net. 120 IN CNAME bidcast-bcserver-gce-sc.bidswitch.net. bidcast-bcserver-gce-sc.bidswitch.net. 120 IN CNAME bidcast-bcserver-gce-sc-multifo.bidswitch.net. bidcast-bcserver-gce-sc-multifo.bidswitch.net. 60 IN A 35.211.56.153 ....... bidcast-bcserver-gce-sc-multifo.bidswitch.net. 60 IN A 35.207.61.237 bidcast-bcserver-gce-sc-multifo.bidswitch.net. 60 IN A 35.207.23.245 ;; Query time: 125 msec ;; SERVER: 168.63.129.16#53(168.63.129.16) ;; WHEN: Thu Oct 03 22:17:18 EEST 2019 ;; MSG SIZE rcvd: 1905
I checked everything I found over internet, I saw nowhere an explanation why this works as intended only when ran from root account or with sudo permissions if the response package size is too big and the response gets truncated, forcing the DNS query to switch from UDP to TCP.
Adding "options edns0" or "options use-vc" or "options edns0 use-vc" to /etc/resolv.conf doesn't help either.
Same behavior in CentOS 7.x, Ubuntu 16.04 and 18.04
Update: tested with curl and telnet, the behavior is the same. Works with sudo or from root account, fails without sudo or from standard account.
Can anyone please provide some insight about why it needs superuser permissions when switching from UDP to TCP and help with some solution, if any?
UPDATE:
- I know this is long post, but please read it all before answering.
- Firewall is set to allow any to any.
- Port 53 is open on TCP and UDP in all the test environments I have.
- SELinux/AppArmor is disabled.
Update2:
Debian9 (kernel 4.19.0-0.bpo.5-cloud-amd64 ) works correctly without the sudo.
RHEL8 (kernel 4.18.0-80.11.1.el8_0.x86_64) works correcly, but with huge delays (up to 30sec), without sudo.
Update3:
List of distributions I was able to test and it doesn't work:
- RHEL 7.6, kernel 3.10.0-957.21.3.el7.x86_64
- CentOS 7.6, kernel 3.10.0-862.11.6.el7.x86_64
- Oracle7.6, kernel 4.14.35-1902.3.2.el7uek.x86_64
- Ubuntu14.04, kernel 3.10.0-1062.1.1.el7.x86_64
- Ubuntu16.04, kernel 4.15.0-1057-azure
- Ubuntu18.04, kernel 5.0.0-1018-azure
- Ubuntu19.04, kernel 5.0.0-1014-azure
- SLES12-SP4, kernel 4.12.14-6.23-azure
- SLES15, kernel 4.12.14-5.30-azure
So, basically the only distribution I tested and is without problems is Debian 9. Since RHEL 8 has huge delays, which may trigger time outs, I cannot consider it fully working.
So far, the biggest difference between Debian 9 and the rest of distributions I tested is the systemd (missing on debian 9)… not sure how to check if this is the cause.
Thank you!
Best Answer
"Can anyone please provide some insight about why this works like this and help with some solution, if any?"
SHORT ANSWER:
A default Azure VM is created with broken DNS:
systemd-resolved
needs further configuration.sudo systemctl status systemd-resolved
will quickly confirm this./etc/resolv.conf
points to127.0.0.53
- a local unconfigured stub resolver.The local stub resolver
systemd-resolved
was unconfigured. It had no forwarder set so after hitting127.0.0.53
it had nobody else to ask. Ugh. Jump to the end to see how to configure it for Ubuntu 18.04.If you care about how that conclusion was reached, then please read the Long Answer.
LONG ANSWER:
Why DNS Responses Truncated over 512 Bytes:
Source: https://tools.ietf.org/html/rfc7766
ANALYSIS:
This was trickier than I thought. So I spun-up an Ubuntu 18.04 VM in Azure so I could test from the vantage point of the OP:
My starting point was to validate nothing was choking-off the DNS queries:
All chains in the iptables had their default policy set to ACCEPT and although Apparmor was set to "enforcing", it wasn't on anything involved with DNS. So no connectivity or permissions issues observed on the host at this point.
Next I needed to establish how the DNS queries were winding through the gears.
So according to
resolv.conf
, the system expects a local stub resolver calledsystemd-resolved
. Checking the status of systemd-resolved per the hint given in the text above we see it's erroring:/etc/nsswitch.conf
set the order sources of sources used to resolved DNS queries. What does this tell us?:Well, the DNS queries will never hit the local
systemd-resolved
stub resolver as it's not specified in/etc/nsswitch.conf
.Are the forwarders even set for the
systemd-resolved
stub resolver?!?!? Let's review that configuration in/etc/systemd/resolved.conf
Nope:
systemd-resolved
has no forwarder set to ask if a local ip:name mapping is not found.The net result of all this is:
/etc/nsswitch.conf sends DNS queries to DNS if no local IP:name mapping found in
/etc/hosts
The DNS server to be queried is
127.0.0.53
and we just saw this is not configured from reviewing its' config file/etc/systemd/resolved.conf
. With no forwarder specified in here, there's no way we'll successfully resolve anything.TESTING:
I tried to override the stub resolver
127.0.0.53
by directly specifying 168.63.129.16. This failed:Nope: seeing
;; SERVER: 127.0.0.53#53(127.0.0.53)
in the output tells us that we've not overridden it and the local, unconfigured stub resolver is still being used.However using either of the following commands overrode the default
127.0.0.53
stub resolver and therefore succeeded in returningNOERROR
results:or
So any queries that relied on using the
systemd-resolved
stub resolver were doomed until it was configured.SOLUTION:
My initial- incorrect- belief was that TCP/53 was being blocked: the whole "Truncated 512" was a bit of a red-herring. The stub resolver was not configured. I made the assumption- I know, I know, "NEVER ASSUME ;-) - that DNS was otherwise configured.
How to configure
systemd-resolved
:Ubuntu 18.04
Edit the
hosts
directive in/etc/nsswitch.conf
as below by prependingresolve
to setsystemd-resolved
as the first source of DNS resolution:Edit the
DNS
directive (at a minimum) in/etc/systemd/resolved.conf
to specify your desired forwarder, which in this example would be:Restart
systemd-resolved
:RHEL 8:
Red Hat almost does everything for you in respect to setting up
systemd-resolved
as a stub resolver, except they didn't tell the system to use it!Edit the
hosts
directive in/etc/nsswitch.conf
as below by prependingresolve
to setsystemd-resolved
as the first source of DNS resolution:Then restart
systemd-resolved
:Source: https://www.linkedin.com/pulse/config-rhel8-local-dns-caching-terrence-houlahan/
CONCLUSION:
Once
systemd-resolved
was configured my test VM's DNS behaved in the expected way. I think that about does it....