Linux system time temporally jumps

logstimetimestamps

I saw a strange system time changing behavior in some (hardware) servers: in /var/logs/syslog, the date time preceding each log message sometimes change to a random one and gets back to normal in the next message, like the following:

Feb 22 2018 09:09:30 ...  
Feb 22 2018 09:09:32 ...  
Jan 13 2610 15:37:42 ...  
Feb 22 2018 09:09:33 ...  
Feb 22 2018 09:09:34 ...

As in the example, the sudden change of date time can be as far as hundreds of years away.

I can confirm that the log messages having the strange time stamps does not come from any specific process – it just can happen randomly for every one.

And duration between 2 abnormal time changes varies between a few minutes to a few hours (however, I suspect the abnormal time changes could happen more frequently but many of them are not revealed in the syslog, since it is not writing logs every second).

Also, since it happens on more than one server, I assume it is not a hardware problem.

More info about the severs: they are an openstack installation with one controller and a few compute nodes. Each server has ntp service running. The controller is configured to take time from its own hardware clock, and the compute node servers sync time from the controller. Note that each server have abnormal time changes at its own pace – looks like the "wrong time" is not synchronized from the controller through ntp.

I was suspecting the guest systems (virtual machines) on compute nodes could affect their host system time. But this can not explain why the controller has the same problem while not running any virtual machine.

I need a method to detect: who changed the system time and how does it happen?

Best Answer

The relevant aspects are the kernel versions and these lines from early in the boot process:

kernel: Fast TSC calibration using PIT
...
kernel: Calibrating delay loop (skipped), value calculated using timer frequency..
...
kernel: Switching to clocksource tsc

YMMV and you may not be using TSC or PIT

AFAIK this is a bug that's caused by the clock of at least one of your CPUs being out of sync, in your case probably running too fast.

It should be easy to confirm by running this:

for cpu in {0..8} ; do taskset -c $cpu date ; done

which will run date against each cpu (assuming you have up to 8 cores/threads). If my guess is correct then one of your CPUs will consistently have the wrong time.

If that's the case then you should first try upgrading the kernel and if that doesn't work, fiddle with the clocksource boot parameter (assuming it's x86-64):

clocksource=    Override the default clocksource
                Format: <string>
                Override the default clocksource and use the clocksource
                with the name specified.
                Some clocksource names to choose from, depending on
                the platform:
                [all] jiffies (this is the base, fallback clocksource)
                [ACPI] acpi_pm
                ...
                [X86-64] hpet,tsc

Related Solutions

How to get time synced outputs (different timezones on servers)

You can use date to print date timestamp and store it with log message.

$ TZ='Europe/Warsaw' date
wto, 23 kwi 2013, 17:11:48 CEST
$ TZ='America/Los_Angeles' date
wto, 23 kwi 2013, 08:11:56 PDT
$ date --universal
wto, 23 kwi 2013, 15:13:14 UTC

Use tzselect to find time zones.

Centos – Force time to stay put

For this answer, I'll assume that there may be several elements working hard to set your time straight. Since I don't really want to wild-guess about which one is working against you, I'll try and give you an answer which should help you find it yourself instead.

On a UNIX system, the clock can typically be set using the stime system call. As things evolved, it also became possible to set the clocks more accurately using the clock_settime call instead. You might also come accross settimeofday. When running date --set on a CentOS machine, strace revealed that it used clock_settime.

Knowing this, a solution would be to trace these system calls. Good thing is, Linux has a mechanism for that: debugfs. On my system, calling mount, I can see that this is available at /sys/kernel/debug :

$ mount
none on /sys/kernel/debug type debugfs (rw)
...

However, on some systems (including RedHat and probably CentOS), it isn't mounted at boot time. You'll therefore need to run...

# mount -t debugfs nodev /sys/kernel/debug

Also note that if you were in that directory before mounting, you might have to go out and back in before files start to appear in it.

Now we're ready to go. Let's enable the trace for our systems calls. I'm tracing all of them because I don't really want to check which one is really being used. Tracing for system calls can be set in /sys/kernel/debug/tracing/events/syscalls. In this directory, you should find...

sys_enter_stime
sys_enter_clock_settime
sys_enter_settimeofday

... depending on what's available on your system.

These correspond to the events of entering our system calls, which is what we want to trace (you'll also find sys_exit_* directories). Within each directory, you'll find a file named enable, the contents of which should appear to be 0. To trace these calls, set that to 1 instead:

# echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_stime/enable
# echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_clock_settime/enable
# echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_enter_settimeofday/enable

Now that we've set up our trap, just wait until something sets your time to its correct value. Once it has happened, run for the trace logs at...

# cat /sys/kernel/debug/tracing/trace

Now, unless something wrong occured, you should see one of the following lines:

stime-xxxxx [xxx] .... x.x: sys_stime(...)
clock_settime-xxxxx [xxx] .... x.x: sys_clock_settime(...)
settimeofday-xxxxx [xxx] .... x.x: sys_settimeofday(...)

The number right after stime- (or another call's name) is the PID of the process which made the system call. Now go get it:

# ps -fp xxxxx
UID        PID  PPID  C STIME TTY          TIME CMD
root     XXXXX XXXXX  0 hh:mm ?        hh:mm:ss time_warrior

You should now have everything you need to make sure your system stops getting the time right. The simplest thing would probably be to kill the process, and make sure it isn't spawned at boot time ; of course, you'll have to make sure it doesn't serve a more important purpose before doing so : you don't want to completely crash your system...

Also remember to disable the trace when you're done by writing 0 to the files we edited earlier. A shortcut could be:

# echo 0 > /sys/kernel/debug/tracing/events/syscalls/enable

(this file acts as a master switch for all others ; it allows you to switch all system calls tracing off)

Note: as Mark Plotnick said in a comment systemtap could be a slightly easier way to achieve similar results. I'll let him write a stap answer if he feels like it, since I'm not fluent with stap scripts at all.

Best Answer

Related Solutions

How to get time synced outputs (different timezones on servers)

Centos – Force time to stay put

Related Question