That's not a bad idea. To it improve a little, perhaps each of the clients could be configured to send a message at a regular interval (providing a "heartbeat" or a "mark"). With Rsyslog, the directive is $ActionWriteAllMarkMessages
[on|off]
. But heed the man page:
Note that this option auto-resets to "off", so if you intend to use it with multiple actions, it must be specified in front off all selector lines that should provide this functionality.
What if there is an undiscovered configuration problem on the Syslog server that causes messages to be directed to an unexpected location, such as the file you monitor to show that the client is alive? Perhaps a more rigorous test might be to (grep/awk) for the "heartbeat" or "mark" messages provided by $ActionWriteAllMarkMessages
, looking for the time of the message from a specific host.
To verify that the remote Syslog server is running, you could use netcat (nc
) with the -z
and -u
switches. From the manual:
-u Use UDP instead of the default option of TCP.
-z Specifies that nc should just scan for listening daemons, without sending any data to them. [...]
For example, with a five-second timeout (-w5
):
#!/usr/bin/env bash
hostname="<FQDN or IP Address>"
port="514"
if (nc -z -u -w5 "$hostname" "$port" > /dev/null 2>&1); then
echo "Syslog is up."
else
echo "Syslog could not be reached."
fi
unset hostname
unset port
Or, if the Syslog server also uses TCP, then you might omit the -u
switch to take advantage of the reliability of TCP, if only for this purpose. This checks that the Syslog server is listening and available from the location of the server performing the connectivity check. There's another facet that has equal importance: disk space. (If the Syslog server runs out of space to write messages, then the Syslog server may as well be considered down.)
Considering that UDP can be "unreliable" by design, counting on the "mark" or "heartbeat" message might not always work in a congested network or Syslog server. Another approach might be to install a script on each client. The script (BASH, Python, or whatever works) could return whichever error code you devise (e.g.: Syslog process not running? return 1 or try to start the Syslog process first, or devise other tests like connectivity tests, et al). Use xinetd
to start the script from /etc/xinetd.d/script_name
(on RHEL/CentOS):
service check_syslog
{
type = UNLISTED
port = 6777
socket_type = stream
protocol = tcp
wait = no
user = root
server = /usr/local/sbin/script_name
only_from = 127.0.0.1 10.0.0.110
disable = no
}
On RHEL the restart command is service xinetd restart
. Edit /etc/services
to give a name to port 6777. I like to use iptables
to augment the only_from
stanza. By the way, the port, 6777, was used just for illustration.
rsyslog
includes a rate limiting option by default through the imuxsock
module.
It defaults to 200 messages per 5 seconds but can easily be changed by setting the following after the module is loaded:
$SystemLogRateLimitInterval 5
$SystemLogRateLimitBurst 200
The $SystemLogRateLimitInterval
is the interval in seconds (which you should increase) and $SystemLogRateLimitBurst
is the maximum number of messages allowed by the application during that interval (which you should decrease).
Update: Based on your update regarding the fact that errors are flooding your syslog with different process IDs, there is no practical way for the daemon to deal with these efficiently.
Changing the log rotation rules on maximum file size would therefore be the only possible solution. Note that once compressed (as per usual log rotation process), these large files will become minuscule because of the repetitivity of their contents.
Best Answer
Sounds like you're looking for the
$RepeatedMsgReduction
config switch, which turns such duplicate messages into one by logging "Last line repeated n times". You can also discard unwanted messages.