What to look for in log files if I think limited memory or disk space is causing a crash

logsterminology

Troubleshooting /var/log files for a recent series of crashes, what should I look for in the files if I believe low memory or disk space are to blame? Is there a general term used in the Linux error-throwing lingo for hardware faults of this kind? And, what system processes would be effected, such as the kernel, by a critical shortage of memory?


Just as background, I was working on a Drupal site hosted on my Fedora 17 sandbox project laptop when I experienced these system crashes. Recently I've downloaded some rather large files (I've since moved to media) and was down to about 1.8G of HD space.

I found some useful posts here about monitoring memory usage with top or current disk usage with du. This question, however, is specifically about log files. I found a similar post at Fedora Forums searching for an explanation of FPrintObject which lead me to do Memtest, but nothing is reported bad there.

Best Answer

The information you are looking for is not found in usual syslog logs. For viewing performance history from the command line, sysstat is an excellent tool.

With sysstat, the sadc collects system information and writes them to a log file. The log file is a binary format, but can be viewed with the sar command.

Here is an example of sar output with no options:

$ sar
09:15:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:05:01 AM     all     77.49      0.37     22.13      0.00      0.00      0.00
10:15:01 AM     all     77.30      0.40     22.29      0.00      0.00      0.00
10:25:01 AM     all     77.19      0.38     22.42      0.00      0.00      0.00
10:35:01 AM     all     39.31      0.35     23.80      0.01      0.00     36.53
10:45:01 AM     all     32.22      0.34     24.26      0.03      0.00     43.15
10:55:01 AM     all     32.80      0.33     23.78      0.01      0.00     43.08
11:05:01 AM     all     32.70      0.33     23.76      0.00      0.00     43.20
Average:        all     63.90      0.39     22.79      0.00      0.00     12.91

The information you see is the same information provided by top, but is historical data. You can also see detailed information about RAM, network, and disk utilization. Here is an example for RAM usage:

$ sar -r
09:15:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
02:15:01 PM    457076   1357116     74.81    277876    810948    205520      5.40
02:25:01 PM    456836   1357356     74.82    277876    811168    205384      5.40
02:35:01 PM    456976   1357216     74.81    277876    811256    204728      5.38
02:45:01 PM    457036   1357156     74.81    277876    811368    204840      5.38
02:55:01 PM    456588   1357604     74.83    277896    811492    204924      5.38
Average:       332452   1481740     81.67    277720    793953    416953     10.96

Outside of running sar locally, there are many monitoring systems that show performance trending data. Munin, cacti, and zabbix are some examples. These have the benefit of graphing and keeping the data for multiple servers in a centralized location.

Update to answer from comments:

The sar command will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either.