Linux – How to measure and prevent clock drift

clocklinuxsles

On several production platforms we have observed symptoms which appear to suggest that the time of day clock is periodically jumping forward or backward. The jumps are typically around 1 second, typically cancel out (jump forward then backward very shortly thereafter) and happen around 50 times per day. This drift is most noticeable during times of peak application usage, and during periods of high disk I/O operations such as daily backups. These drifts are affecting our soft real-time sensitive application.

Systems are Oracle Netra X4250 and Netra X4270 servers running SLES 11SP2 with 3.0.58-0.6.6-default kernel.

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

We have disabled NTP, but that has not had any effect on the drifts. Are there tools which measure time of day clock drift? How can we avoid this?

These are production platforms, and we cannot recreate the issue in our labs, so my ability to experiment is limited. If left to my own devices, I'll write a tool to measure drift, and perhaps experiment with an HPET clocksource.

Best Answer

Are there tools which measure time of day clock drift?

The only tools I'm aware of are the NTP tools which should suffice. You don't have to actually configure ntpd to sync against a given clock source you can just use the -d option to ntpdate to fetch the calculated offset.

Example:

[davisja5@xxxadmvlm08 ~]$ ntpdate -d clock.redhat.com 2>/dev/null | egrep "^offset"
offset -0.004545
[davisja5@xxxadmvlm08 ~]$

-d is the debug option which does the NTP work without actually touching the system clock.

Any advice on how we can avoid this?

I'm not too surprised that you aren't able to reproduce this in dev/test environments since it's probably just due to the hardware clock. If you have hardware support with someone, I would try to get your machines serviced. One possibility is trading out one of the dev machines for this production machine, fixing the former PROD systems and re-introduce it as a dev machine to replace the one that's in PROD now.

Short of that, switching the hardware clock source is about all you can do. If you don't or can't do the swap thing I'd suggest that you do go the hpet route. You can test whether the clock source change messes with system services and then deploy it into production as a hail mary.