TL; DR
VM using KVM, time is not synchronized. After a 2 minute suspend, it keeps a permanent 2 min gap. Setting up another VM with different network config shows that network config prevents ntp from working. Fixing this network issue is out of topic.
However, the new VM that does not have the network issue does not synchronize either after a resume. Same test: suspend 2 minutes. Check the date difference with a machine that is properly synced. The 2 min delay is permanent.
This seems to be a common issue and there is controversy about how to keep a VM synchronized, and about using NTP and kvm-clock at the same time. I found many references to that but no answer.
Question
I have a Debian VM with ntpd
running but not correcting time. For instance, after a suspend/resume, I get a permanent 2 minute offset.
/etc/ntp.conf
is default or close to default, nothing fancy:
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines. Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
ntpq seems to report a problem:
# cat ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
37.187.7.160 .INIT. 16 u - 1024 0 0.000 0.000 0.000
195.154.211.37 .INIT. 16 u - 1024 0 0.000 0.000 0.000
195.154.216.44 .INIT. 16 u - 1024 0 0.000 0.000 0.000
95.81.173.155 .INIT. 16 u - 1024 0 0.000 0.000 0.000
However, I'm not a netcat wizard, but AFAIU outgoing traffic on UDP port 123 goes through:
# nc -vvzu 37.187.7.160 123
mail.lafkor.de [37.187.7.160] 123 (ntp) open
sent 0, rcvd 0
Is this test enough to rule out the firewall issue?
The host (also a Debian machine) has the same NTP configuration and synchronization is working. The network config for both machines is different, which is why I'm thinking it might be a network issue.
Any other useful test I could run?
I don't think the tinker panic 0
parameter is relevant here as it is meant to force updates on huge gaps, not 2 minute gaps. And anyway, AFAIU, it would affect the behavior in case of time offset, but it would not solve ntpq -pn
returning only zeros.
FWIW, other test outputs inspired from this question:
# ntpq
ntpq> pe
remote refid st t when poll reach delay offset jitter
==============================================================================
mail.lafkor.de .INIT. 16 u - 1024 0 0.000 0.000 0.000
atoll.tropicdre .INIT. 16 u - 1024 0 0.000 0.000 0.000
oods.roflcopter .INIT. 16 u - 1024 0 0.000 0.000 0.000
ntp-3.arkena.ne .INIT. 16 u - 1024 0 0.000 0.000 0.000
ntpq> as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 21025 8011 yes no none reject mobilize 1
2 21026 8011 yes no none reject mobilize 1
3 21027 8011 yes no none reject mobilize 1
4 21028 8011 yes no none reject mobilize 1
ntpq> rv
associd=0 status=c012 leap_alarm, sync_unspec, 1 event, freq_set,
version="ntpd 4.2.6p5@1.2349-o Fri Apr 10 19:04:04 UTC 2015 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=11, stratum=16,
precision=-23, rootdelay=0.000, rootdisp=6683.055, refid=INIT,
reftime=00000000.00000000 Mon, Jan 1 1900 0:09:21.000,
clock=d9b51587.b7a1085f Tue, Sep 29 2015 15:49:59.717, peer=0, tc=3,
mintc=3, offset=0.000, frequency=-0.125, sys_jitter=0.000,
clk_jitter=0.000, clk_wander=0.000
ntpq> rv 21025
associd=21025 status=8011 conf, sel_reject, 1 event, mobilize,
srcadr=mail.lafkor.de, srcport=123, dstadr=147.210.157.185, dstport=123,
leap=11, stratum=16, precision=-23, rootdelay=0.000, rootdisp=0.000,
refid=INIT, reftime=00000000.00000000 Mon, Jan 1 1900 0:09:21.000,
rec=00000000.00000000 Mon, Jan 1 1900 0:09:21.000, reach=000,
unreach=1137, hmode=3, pmode=0, hpoll=10, ppoll=10, headway=0,
flash=1600 peer_stratum, peer_dist, peer_unreach, keyid=0, offset=0.000,
delay=0.000, dispersion=15937.500, jitter=0.000, xleave=0.167,
filtdelay= 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtoffset= 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtdisp= 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0
tcpdump / ntpdate tests
On a machine where NTP sync works correctly, I launch tcpdump udp port ntp
and when I restart ntpd
, I see this kind of output:
# tcpdump udp port ntp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:31:33.719166 IP 10.0.2.15.ntp > spica.beduzar.fr.ntp: NTPv4, Client, length 48
17:31:33.736804 IP spica.beduzar.fr.ntp > 10.0.2.15.ntp: NTPv4, Server, length 48
17:31:35.973551 IP 10.0.2.15.ntp > ntp.tuxfamily.net.ntp: NTPv4, Client, length 48
17:31:35.992671 IP ntp.tuxfamily.net.ntp > 10.0.2.15.ntp: NTPv4, Server, length 48
[...]
On the machine I have the issue with, I don't see any output at all when restarting ntpd
(no request, no reply). Shouldn't I at least see the requests?
On the good machine:
# ntpdate 0.debian.pool.ntp.org
29 Sep 17:24:49 ntpdate[700]: adjust time server 193.55.167.1 offset -0.005196 sec
On the bad machine:
# ntpdate 0.debian.pool.ntp.org
29 Sep 17:43:18 ntpdate[3180]: no server suitable for synchronization found
Test with another VM
We setup another VM with the same NTP configuration but another network configuration.
This results of tcpdump
and ntpdate
are correct and ntpq -pn
returns good results. So apparently, the network configuration is indeed an issue on the faulty VM.
However, the new VM does not synchronize either. If I suspend it so that it has about 100s lag, it does not synchronize (I mean after a few minutes, the gap is still the same number of seconds). However, when restarting ntpd, it synchronize instantly.
I appear to have two issues:
-
Network config on the first VM
-
ntp does not synchronize on both (unless restarted)
Best Answer
Problem solved.
Network issue
The VM had network issues preventing ntpd to succeed. It has two
eth
interfaces, and the one with the gateway goes through a router we don't manage directly. Although my tests wouldn't show it, I guess some UDP frames were blocked. We set up another VM with another network config andntpq
yielded better results.Ultimately, we changed the
ntp
config so that the host broadcasts time locally and all VM synchronize on it. Makes more sense and minimizes load on publicntp
servers.ntpd
sets clock instantly after a few minutesOne thing that probably mislead me during the tests is that ntpd does not synchronize immediately. I thought it would detect a gap right away and then modify the clock speed so that the clock progressively joins the source clock. In fact, we noticed that (unless
ntpd
is restarted) the clock is unchanged for a few minutes then all of a sudden it is set what seems instantly. In the meantime, the rightmost columns inntpq
output show that synchronization is going on.This
ntpd
behavior probably explains why I thoughtntpd
didn't work even if it did. I just didn't wait long enough and I didn't understandntpq
output.