Ubuntu – What are possible reasons for erratic NTP synchronistation

clockntpntpdUbuntu

On a Ubuntu 10.04 system I noticed following strange NTP sync events:

Jul  3 02:19:51 hst ntpd[1432]: no servers reachable
Jul  3 02:36:55 hst ntpd[1432]: synchronized to 91.189.94.4, stratum 2
Jul  3 02:53:48 hst ntpd[1432]: time reset -10.407942 s
Jul  3 02:53:48 hst ntpd[1432]: kernel time sync status change 6001
Jul  3 02:53:48 hst dovecot: dovecot: Fatal: Time just moved backwards by 10 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards
Jul  3 02:58:37 hst ntpd[1432]: synchronized to 91.189.94.4, stratum 2
Jul  3 02:58:37 hst ntpd[1432]: kernel time sync status change 2001
Jul  3 03:08:15 hst ntpd[1432]: no servers reachable
Jul  3 03:16:49 hst ntpd[1432]: synchronized to 91.189.94.4, stratum 2
Jul  3 03:17:01 hst CRON[28221]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jul  3 03:18:04 hst ntpd[1432]: time reset +10.403648 s
Jul  3 03:22:41 hst ntpd[1432]: synchronized to 91.189.94.4, stratum 2

Where 91.189.94.4 europium.canonical.com and the only server line in ntp.conf is:

server ntp.ubuntu.com

The update at 2:36 seems pretty bogus because it is canceled out 25 minutes later.

What could be possible reasons for this?

I can think of:

  • remote NTP server just provides the wrong time
  • network problems (could a high latency introduce such drifts?)
  • leap second induced bug (this should induce a crash instead, right?)

If the first alternative was the problem how can I protect against this?

Is NTPD smart enough to consult multiple NTP servers (when multiple server lines are available in ntp.conf) and detect if different answers deviate too much from each other?

Best Answer

I've seen syslog entries like that on a Slackware machine a few years ago. I believe I bought the machine in question in 2002, and pretty much ran it 24/7 for years: it was my SSH, SMTP and HTTP server. The NTP failures came on slowly, and gradually increased in frequency.

I fixed it the first time by changing the "CMOS RAM" battery, which was one of those coin-sized (US quarter) CR2032 batteries on the motherboard.

After another year or two of operation, that machine just absolutely quit keeping time accurately, and I had to regularly restart ntpd. As I understand it, ntpd keeps a "skew file" based on past data of how the local clock differs from the network clock(s). My guess was that the motherboard in question never had a great clock, and the clock finally became so bad that the "skew file" just couldn't keep up with its wild variance.

Related Question