What to look for in log files if I think limited memory or disk space is causing a crash

logsterminology

Troubleshooting /var/log files for a recent series of crashes, what should I look for in the files if I believe low memory or disk space are to blame? Is there a general term used in the Linux error-throwing lingo for hardware faults of this kind? And, what system processes would be effected, such as the kernel, by a critical shortage of memory?

Just as background, I was working on a Drupal site hosted on my Fedora 17 sandbox project laptop when I experienced these system crashes. Recently I've downloaded some rather large files (I've since moved to media) and was down to about 1.8G of HD space.

I found some useful posts here about monitoring memory usage with top or current disk usage with du. This question, however, is specifically about log files. I found a similar post at Fedora Forums searching for an explanation of FPrintObject which lead me to do Memtest, but nothing is reported bad there.

Best Answer

The information you are looking for is not found in usual syslog logs. For viewing performance history from the command line, sysstat is an excellent tool.

With sysstat, the sadc collects system information and writes them to a log file. The log file is a binary format, but can be viewed with the sar command.

Here is an example of sar output with no options:

$ sar
09:15:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:05:01 AM     all     77.49      0.37     22.13      0.00      0.00      0.00
10:15:01 AM     all     77.30      0.40     22.29      0.00      0.00      0.00
10:25:01 AM     all     77.19      0.38     22.42      0.00      0.00      0.00
10:35:01 AM     all     39.31      0.35     23.80      0.01      0.00     36.53
10:45:01 AM     all     32.22      0.34     24.26      0.03      0.00     43.15
10:55:01 AM     all     32.80      0.33     23.78      0.01      0.00     43.08
11:05:01 AM     all     32.70      0.33     23.76      0.00      0.00     43.20
Average:        all     63.90      0.39     22.79      0.00      0.00     12.91

The information you see is the same information provided by top, but is historical data. You can also see detailed information about RAM, network, and disk utilization. Here is an example for RAM usage:

$ sar -r
09:15:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
02:15:01 PM    457076   1357116     74.81    277876    810948    205520      5.40
02:25:01 PM    456836   1357356     74.82    277876    811168    205384      5.40
02:35:01 PM    456976   1357216     74.81    277876    811256    204728      5.38
02:45:01 PM    457036   1357156     74.81    277876    811368    204840      5.38
02:55:01 PM    456588   1357604     74.83    277896    811492    204924      5.38
Average:       332452   1481740     81.67    277720    793953    416953     10.96

Outside of running sar locally, there are many monitoring systems that show performance trending data. Munin, cacti, and zabbix are some examples. These have the benefit of graphing and keeping the data for multiple servers in a centralized location.

Update to answer from comments:

The sar command will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either.

Related Solutions

Parsing log files for frequent IP’s

I've always used this:

tail -1000 /var/log/apache_access | awk '{print $1}' | sort -nk1 | uniq -c | sort -nk1

With tail I'm able to set the limit of how far back I really want to go - good if you don't use log rotate (for whatever reason), second I'm making use of awk - since most logs are space delimited I've left my self with the ability to pull additional information out (possibly what URLs they were hitting, statuses, browsers, etc) by adding the appropriate $ variable. Lastly a flaw in uniq it only works in touching pairs - IE:

A
A
A
A
B
A
A

Will produce:

4 A
1 B
2 A

Not the desired output. So we sort the first column (in this case the ips, but we could sort other columns) then uniq them, finally sort the count ascending so I can see the highest offenders.

Debian: what is “/var/log/apt/term.log” good for

If you need to figure out what happened during the installation of a package, the information is there.

This file is unlikely to contain any information that would affect your privacy. Maybe some edge cases such as which mirror you downloaded a few files from, which could reveal your broad geographical location. But other system logs have far more detailed information, so this is irrelevant unless you've done a lot of scrubbing already (in which case, just include this file in your scrubbing).

The size of the file is insignificant by today's standards (and even by yesterday's).

The location of the file is determined by the APT settings Dir::Log (default: /var/log/apt) and Dir::Log::Terminal (default: term.log). If you set this option to an empty string in /etc/apt/apt.conf (Dir::Log::Terminal ""), the log file won't be created. But again, that's pointless.

Best Answer

Related Solutions

Parsing log files for frequent IP’s

Debian: what is “/var/log/apt/term.log” good for

Related Question