Linux system time temporally jumps

logstimetimestamps

I saw a strange system time changing behavior in some (hardware) servers: in /var/logs/syslog, the date time preceding each log message sometimes change to a random one and gets back to normal in the next message, like the following:

Feb 22 2018 09:09:30 ...  
Feb 22 2018 09:09:32 ...  
Jan 13 2610 15:37:42 ...  
Feb 22 2018 09:09:33 ...  
Feb 22 2018 09:09:34 ...  

As in the example, the sudden change of date time can be as far as hundreds of years away.

I can confirm that the log messages having the strange time stamps does not come from any specific process – it just can happen randomly for every one.

And duration between 2 abnormal time changes varies between a few minutes to a few hours (however, I suspect the abnormal time changes could happen more frequently but many of them are not revealed in the syslog, since it is not writing logs every second).

Also, since it happens on more than one server, I assume it is not a hardware problem.

More info about the severs: they are an openstack installation with one controller and a few compute nodes. Each server has ntp service running. The controller is configured to take time from its own hardware clock, and the compute node servers sync time from the controller. Note that each server have abnormal time changes at its own pace – looks like the "wrong time" is not synchronized from the controller through ntp.

I was suspecting the guest systems (virtual machines) on compute nodes could affect their host system time. But this can not explain why the controller has the same problem while not running any virtual machine.

I need a method to detect: who changed the system time and how does it happen?

Best Answer

The relevant aspects are the kernel versions and these lines from early in the boot process:

kernel: Fast TSC calibration using PIT
...
kernel: Calibrating delay loop (skipped), value calculated using timer frequency..
...
kernel: Switching to clocksource tsc

YMMV and you may not be using TSC or PIT

AFAIK this is a bug that's caused by the clock of at least one of your CPUs being out of sync, in your case probably running too fast.

It should be easy to confirm by running this:

for cpu in {0..8} ; do taskset -c $cpu date ; done

which will run date against each cpu (assuming you have up to 8 cores/threads). If my guess is correct then one of your CPUs will consistently have the wrong time.

If that's the case then you should first try upgrading the kernel and if that doesn't work, fiddle with the clocksource boot parameter (assuming it's x86-64):

clocksource=    Override the default clocksource
                Format: <string>
                Override the default clocksource and use the clocksource
                with the name specified.
                Some clocksource names to choose from, depending on
                the platform:
                [all] jiffies (this is the base, fallback clocksource)
                [ACPI] acpi_pm
                ...
                [X86-64] hpet,tsc

See also the output of this:

cat /sys/devices/system/clocksource/clocksource*/available_clocksource
Related Question