Collectd cannot monitor ntpd 4.2.8 (Ubuntu 16.04)

dockerntpd

I have a Docker container based on Ubuntu 16.04 which runs the ntpd 4.2.8 service. Upon instantiation of the container, I've published the port 123/udp.

From the host or other computer on the LAN, I can use ntpq -p <container_host> to get the list of peers and sync status. But monitoring it using collectd or running ntpdc -c kerninfo <container_host> fails/timeouts. And this is puzzling me!

I've tested it inside the container with some reasonable restrict statements and also without any. But in both cases I timeout. Running tcpdump in the container (after elevating it to privileged container) shows that the UDP packet arrives, but nothing is answered. Of course, using tcpdump I see both request and response when using ntpq which is working.

If I run the ntpd server on the host directly, using the same ntp.conf file, the ntpdc -c kerninfo <container_host> and collectd both succeed from the host and other computers on the LAN I authorised! However, the host is still running an older version of Ubuntu (14.04) which ships with ntp 4.2.6.

So the only differences are the Docker networking (NAT as far as I understood) and the ntp version (4.2.6 vs 4.2.8). But the ntp.org documentation does not mention anything really about NAT nor about 4.2.8 updates. So is my command time-outing just because the client is on a different subnet than the server (due to NAT)? Or as something changed in 4.2.8?

Note: my container image is based on ubuntu:16.04 which runs ntpd 4.2.8p4@1.3265-o (from the Ubuntu official repositories). The host runs Ubuntu 14.04 which runs 4.2.6p5.

PS: Collectd submit a command equivalent to the ntpdc -c kerninfo <container_host> and timeout when ntpd runs in the container, even is all the restrict statement are correct.

Update: I forgot to mention that I also did run the ntpd inside the container with the -ddd option to get a more verbose output. The only relevant data that were logged are:

read_network_packet: fd=19 length 192 from 192.168.1.3
receive: at 26 172.17.0.2<-192.168.1.3 flags 19 restrict 000

Update2: After finding out the solution, I've changed the question hoping that other stumbling on the same problem might better find the question/answer when searching for it. I also corrected one mistake, I thought the host was running Ubuntu 16.04 but it was actually still running 14.04.

Best Answer

I solved my problem. The error is due to ntp 4.2.8 deprecating (and disabling by default) the tool ntpdc and the communication mode it was using (aka mode7).

From ntp 4.2.8 and newer version, the tool ntpq shall be used in place of ntpdc. It now supports the same commands as ntpdc. So I can run ntpq -c kerninfo <container_host> successfully. The ntpq command uses a different mode (aka mode6) for the communication.

With ntp 4.2.8, it is still possible to re-enable the mode7 in order to support compatibility with tools which have not yet migrated. One must add the following line in the /etc/ntp.conf:

enable mode7

However, one should be very careful with the restriction. As it seems that enabling mode7 and leaving the ntpd server too open, could be used to conduct DDoS amplification attacks. I'm currently using the default restriction for both IPv4 and IPv6 on Ubuntu, which - I think - blocks the use of this mode:

restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited

As collectd supports only mode7 (see issue #932), I decided to re-enable this mode in my configuration inside the container. As long as ntp supports re-enabling this mode, this change should fix the problem that collectd cannot monitor ntpd on Ubuntu 16.04 (or any distro using ntp 4.2.8+).

Note: in order for people to better find a solution if they encounter this problem, I'm going to edit the question in order to be less misleading regarding NAT which I though initially was the root cause.

Related Question