Shell – My /var/log/ is thesteriously filling up GBs in minutes! Any cure before I re-install Debian 7

debiandisk-usageprocessshelltroubleshooting

Good morning, fellow *nix enthusiasts!

I have been using Debian 7 for a while now and after a recent upgrade I noticed I constantly kept running out space on my root partition. I mean to the point where I had '0' bytes left on disk! So, after a lot of searching, I was able to zero-in on the /var/log folder. I used ls -s -S to arrange the files by size in this folder and noticed that three files were GBs in size (such as 13-15 GB):

syslog
messages
kern.log

And yes, logrotate is working fine. It is rotating the logs. For example, I see kern.log.1 etc in /var/log. The problem is the logs are filling up so extremely fast that there's nothing logrotate can do.

Apparently, some logging process in the OS is writing a lot of data which could be because of constant errors or something(??). I don't know. All I know is my laptop is over-heating simply because there's so much processing going on all the time due to this constant write process. So, I'm losing CPU power, AND disk space.

My question is: how can I determine what process/daemon is creating this issue? How do I get to the root-cause of the problem so I could correct it? Reading these HUGE log files is not an option. Please. If I try to pull up a 15 GB log file in a text editor like leafpad or notepad on an already busy laptop, it just takes ages and ages to open. That is not practical.

I realize that this question is broad because there could be any process/daemon causing this, but I want to know if anyone has experienced this before, and if there are any usual suspects I could look at.

UPDATE:

Following Eric's advice, I arranged the files in /var/log by modification time, and 'syslog' was the last one. So, I tail'ed it. The result:

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]
Apr 10 00:53:37 MyMachine kernel: [11608.690742]  [<ffffffff81084f57>] ? process_one_work+0x147/0x3b0
Apr 10 00:53:37 MyMachine kernel: [11608.690750]  [<ffffffff81085764>] ? worker_thread+0x114/0x480
Apr 10 00:53:37 MyMachine kernel: [11608.690756]  [<ffffffff81556065>] ? __schedule+0x2e5/0x790
Apr 10 00:53:37 MyMachine kernel: [11608.690765]  [<ffffffff81085650>] ? create_worker+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690772]  [<ffffffff8108ae91>] ? kthread+0xc1/0xe0
Apr 10 00:53:37 MyMachine kernel: [11608.690780]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690788]  [<ffffffff8155a23c>] ? ret_from_fork+0x7c/0xb0
Apr 10 00:53:37 MyMachine kernel: [11608.690795]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690800] ---[ end trace 12dc8d8439345c1d ]

Unfortunately, it doesn't give me much of a hint.

Best Answer

There is actually a strong hint in the syslog snippet you posted. The end of the line

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]

shows the stack trace is due to an unexpected error in a device driver named ath9k_htc. Hopefully, the kernel didn't panicked but the continuous repetition of errors is filling your file system.

I would then blacklist the ath9k_htc wifi driver using this command then rebooting:

echo "blacklist ath9k_htc" | sudo tee -a /etc/modprobe.d/blacklist.conf

Beware though that doing so might prevent your wifi to work if the ath9k_htc driver was nevertheless used and functional despite the errors.

You can check if a wifi device expected by the ath9k_htc driver is present in your machine by running lsusb and see if a device match one of the list available here: https://wiki.debian.org/ath9k_htc

Related Solutions

Debian: what is “/var/log/apt/term.log” good for

If you need to figure out what happened during the installation of a package, the information is there.

This file is unlikely to contain any information that would affect your privacy. Maybe some edge cases such as which mirror you downloaded a few files from, which could reveal your broad geographical location. But other system logs have far more detailed information, so this is irrelevant unless you've done a lot of scrubbing already (in which case, just include this file in your scrubbing).

The size of the file is insignificant by today's standards (and even by yesterday's).

The location of the file is determined by the APT settings Dir::Log (default: /var/log/apt) and Dir::Log::Terminal (default: term.log). If you set this option to an empty string in /etc/apt/apt.conf (Dir::Log::Terminal ""), the log file won't be created. But again, that's pointless.

Rsyslog filling up /var/log puts the system down

rsyslog includes a rate limiting option by default through the imuxsock module.

It defaults to 200 messages per 5 seconds but can easily be changed by setting the following after the module is loaded:

$SystemLogRateLimitInterval 5
$SystemLogRateLimitBurst 200

The $SystemLogRateLimitInterval is the interval in seconds (which you should increase) and $SystemLogRateLimitBurst is the maximum number of messages allowed by the application during that interval (which you should decrease).

Update: Based on your update regarding the fact that errors are flooding your syslog with different process IDs, there is no practical way for the daemon to deal with these efficiently.

Changing the log rotation rules on maximum file size would therefore be the only possible solution. Note that once compressed (as per usual log rotation process), these large files will become minuscule because of the repetitivity of their contents.

Best Answer

Related Solutions

Debian: what is “/var/log/apt/term.log” good for

Rsyslog filling up /var/log puts the system down

Related Question