Shell – My /var/log/ is thesteriously filling up GBs in minutes! Any cure before I re-install Debian 7

debiandisk-usageprocessshelltroubleshooting

Good morning, fellow *nix enthusiasts!

I have been using Debian 7 for a while now and after a recent upgrade I noticed I constantly kept running out space on my root partition. I mean to the point where I had '0' bytes left on disk! So, after a lot of searching, I was able to zero-in on the /var/log folder. I used ls -s -S to arrange the files by size in this folder and noticed that three files were GBs in size (such as 13-15 GB):

  • syslog
  • messages
  • kern.log

And yes, logrotate is working fine. It is rotating the logs. For example, I see kern.log.1 etc in /var/log. The problem is the logs are filling up so extremely fast that there's nothing logrotate can do.

Apparently, some logging process in the OS is writing a lot of data which could be because of constant errors or something(??). I don't know. All I know is my laptop is over-heating simply because there's so much processing going on all the time due to this constant write process. So, I'm losing CPU power, AND disk space.

My question is: how can I determine what process/daemon is creating this issue? How do I get to the root-cause of the problem so I could correct it? Reading these HUGE log files is not an option. Please. If I try to pull up a 15 GB log file in a text editor like leafpad or notepad on an already busy laptop, it just takes ages and ages to open. That is not practical.

I realize that this question is broad because there could be any process/daemon causing this, but I want to know if anyone has experienced this before, and if there are any usual suspects I could look at.

UPDATE:

Following Eric's advice, I arranged the files in /var/log by modification time, and 'syslog' was the last one. So, I tail'ed it. The result:

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]
Apr 10 00:53:37 MyMachine kernel: [11608.690742]  [<ffffffff81084f57>] ? process_one_work+0x147/0x3b0
Apr 10 00:53:37 MyMachine kernel: [11608.690750]  [<ffffffff81085764>] ? worker_thread+0x114/0x480
Apr 10 00:53:37 MyMachine kernel: [11608.690756]  [<ffffffff81556065>] ? __schedule+0x2e5/0x790
Apr 10 00:53:37 MyMachine kernel: [11608.690765]  [<ffffffff81085650>] ? create_worker+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690772]  [<ffffffff8108ae91>] ? kthread+0xc1/0xe0
Apr 10 00:53:37 MyMachine kernel: [11608.690780]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690788]  [<ffffffff8155a23c>] ? ret_from_fork+0x7c/0xb0
Apr 10 00:53:37 MyMachine kernel: [11608.690795]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690800] ---[ end trace 12dc8d8439345c1d ]

Unfortunately, it doesn't give me much of a hint.

Best Answer

There is actually a strong hint in the syslog snippet you posted. The end of the line

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]

shows the stack trace is due to an unexpected error in a device driver named ath9k_htc. Hopefully, the kernel didn't panicked but the continuous repetition of errors is filling your file system.

I would then blacklist the ath9k_htc wifi driver using this command then rebooting:

echo "blacklist ath9k_htc" | sudo tee -a /etc/modprobe.d/blacklist.conf

Beware though that doing so might prevent your wifi to work if the ath9k_htc driver was nevertheless used and functional despite the errors.

You can check if a wifi device expected by the ath9k_htc driver is present in your machine by running lsusb and see if a device match one of the list available here: https://wiki.debian.org/ath9k_htc

Related Question