Linux Process Monitoring CPU Kill – Is There a Log of Past Threads That Are Now Closed?

cpukilllinuxmonitoringprocess

Sometimes, I have a rogue Java process which takes up 100% of my CPU and makes it jump about 30C in temperature (usually resulting in a crash if not killed).

Problem is, I can never really identify it (its got a long list of parameters and stuff) or analyze it because I have to kill it so quickly.

Is there a sort of log I can look at to see the identity of past processes I have killed? If not, is there a way for me to catch that process next time it shows up?

If it matters I'm OpenSuse 11.4.

Best Answer

No, not by default. There is such a thing as too much logging (especially when you start risking logging the action of writing a log entry…).

BSD process accounting (if you have it, run lastcomm), if active, records the name of every command that is executed and some basic statistics, but not the arguments.

The audit subsystem is more general and more flexible. Install the audit package and read the SuSE audit guide (mostly the part about rules), or try

auditctl -A exit,always -F path=/usr/bin/java -S execve

Or: instead of killing it, kill -STOP it. The STOP suspends the process, no questions asked. You get the option to resume (kill -CONT) or terminate (kill -KILL) later. As long as the process is still around, you can inspect its command line (/proc/12345/cmdline), its memory map (/proc/12345/maps) and so on.

Or: attach a debugger to the process and pause it. It's as simple as gdb --pid 12345 (there may be better options for a Java process); attaching a debugger immediately pauses the process (if you exit the debugger, the process receives a SIGCONT and resumes).

Note that all this only catches OS-level processes, not JVM threads. You need to turn to JVM features to debug threads.

Related Question