Using a script to monitor ntpd
is not commonly done. Usually a monitoring tool like nagios
or munin
is used to monitor the daemon. The tool can send you an alert when things go wrong. I have munin
emailing me if the offset exceeds 15 milliseconds.
Normally, you should use an odd number of servers so that the daemon can perform an election among the servers if one goes off. Three is usually adequate, and more than five is excessive. Clients on your internal network should be able to get by with one internal server if you monitor it. Use legitimate servers or your ISPs NTP or DNS servers as clock sources. There are public pools as well as public servers.
ntpd
is self tuning and you should not need to adjust it once it is configured and started. With recent ntpd
implementations you can drop use of ntpdate
entirely as they can do the initial setting of the date.
The following script will parse the offsets in the output of ntpd and report an excessive offset. You could run it from cron to email you if there are problems. The script defaults to alerting on an offset of 0.1 seconds.
#!/bin/bash
limit=100 # Set your limit in milliseconds here
offsets=$(ntpq -nc peers | tail -n +3 | cut -c 62-66 | tr -d '-')
for offset in ${offsets}; do
if [ ${offset:-0} -ge ${limit:-100} ]; then
echo "An NTPD offset is excessive - Please investigate"
exit 1
fi
done
# EOF
Note: Although NTP had this idea of a nanokernel which could be used to patch OS's that don't use NTP, in Linux in particular is not in this case. The NTP code is in the kernel itself as you allude to in question 1.
0: How does this Nanokernel manage to deliver an accuracy less than the system clock tick (such as ns accuracy)?
Accuracy greater than system clock tick accuracy is done by relying on the aggregated accuracy of other computer(s) or device(s). The system clock tick gives how often this computer's tick is updated. However number of places of accuracy is defined in the particular software used, such as the OS which often relies on POSIX standards. POSIX standards for some time structures go down to nanosecond accuracy as you mention.
To see how we can get accuracy greater than system clock accuracy, suppose on my computer I have attached to it a GPS device or some sort of fancy atomic clock. Whenever someone asks me what time it is, I just consult that clock and give that out.
If ntp is in the kernel as it is for Linux, this GPS device time rather than the system clock time can be used in gettimeofday() calls.
As for the computer's clock, sure, I compare time I get with the GPS or atomic clock with what the computer and when it gets more than a tick away, I arrange to adjust it back using adjtime() described in the answer to question 3.
- If and when was this modification brought into the main line Linux kernel?
The NTP Nanokernel idea was introduced in ntp version 4.0 which goes back at least to 1998. I think it was in the Linux kernel in some form since at least 2.2.36. linux github logs reports Oct 1, 2006 as when the ntp code was segregated into its own file ntp.c
in the kernel. But of course it is there from before.
In sum, none of this is new.
- How does it use the cycle counter, because as far as I'm aware it does not deliver an interrupt, so does the Nanokernel continuously
read the processor registry value containing the current counter?
It uses this like any other program would read a program variable. When code using it runs and the value is needed, say because it has gotten new information back, it reads the variable and updates it. If someone needs to get the time, it uses it in that calculation too. So unless the code was written in a really stupid way (and I'm pretty sure it wasn't), no, it doesn't "continuously read the processor registry value" any more than is necessary.
- Finally does NTPD ever modify the CPU clock frequency, or does it just maintain a software clock where calculated clock adjustment is
applied?
It uses the system call adjtime() which goes back to even before 1998. What adjtime does is arrange periodically for the clock counter to miss an increment to slow down, or to increment by more than 1 to speed up.
Best Answer
Similar question was asked on Serverfault.com. This was the answer.
From the
hwclock
man page on RHEL 4.6:So by the virtue of you running
hwclock --set
you have likely turned it off. By the same token you can check the output of theadjtimex --print
to confirm.