Linux – How to find the process(es) which are hogging the machine

linuxperformance-monitorresource-usage

Scenario: All of a sudden, my computer feels sluggish. Mouse moves but windows take ages to open, etc. uptime says the load is 7.69 and raising.

What is the fastest way to find out which process(es) are the cause of the load?

Now, "top" and similar tools isn't the answer because they either show CPU or memory usage but not both at the same time. What I need is the single command which I might be able to type as it happens – something that will figure out any of

System is trying to swap 8GB of RAM to disk because process X …

or

process X seeks all over the disk

or

process X uses 400% CPU"

So what I'm looking for is iostat, htop/atop and similar tools run into one with an output like this:

 1235 cp - Disk trashing
   87 chrome - Uses 2 GB of RAM
  137 nfs_bench - Uses 95% of the network bandwidth

I don't want a tool that gives me some numbers which I can analyze but a tool that tells me exactly which process causes the current load. Assume that the user in front of the keyboard barely knows how to write "process", but the user is quickly overwhelmed when it comes to "resident size", "virtual memory" or "process life cycle".

My argument goes like this: A user notices a problem. There can be thousands of reasons … well, almost 🙂 The user wants to know the source of the problem.

The current solutions give me lots of numbers, and I need to know what these numbers mean. What I'm looking for is a meta tool. 99% of the data is irrelevant to the problem. So what the tool should do is look for processes which hog some resource and list only those along with "this process needs a lot of CPU, this produces many IRQs, this process allocates a lot of RAM (and it's still growing)".

This will be a relatively short list. It will be much more simple for someone new to this to locate the culprit from this list than from the output of, say, htop which gives me about 5000 numbers but requires me to fold multi-threaded processes myself (I have 50 lines which say VIRT 2750M but only 16 GB of RAM – the machine ought to swap itself to death but of course, this is a misinterpretation of the data that can happen quickly).

Best Answer

I do have to smile at the responses because each told you to run tool X. The only problem is if what you're seeing is intermittent there will be no way to correlate anything. A tool like sar can help if you run it at a high enough frequency, but I'd claim collectl is even better.

Like sar, you run it as a daemon by installing the RPM and doing /etc/init.d/collectl start.

Now when you see something sluggish, collectl -p /var/log/collectl/filename --top will play back the data and show you the top processes. You could have also just run collectl --top and see them in real time. BTW - anything you can do in real time you can playback as well.

As for CPU load, what if you ARE getting overloaded with interrupts? collectl -sC will not only show the loads on individual CPUs (or use -sc for average load), it will show how they're spending their time. Include -j (-scj) and you'll see the number of interrupts/CPU. Use uppercase -J and you'll see the TYPES of each interrupt/CPU.

Of course, if you really like vmstat, you can always playback collectl data with --vmstat and it will show historical data in vmstat format.

There are far more switches than I have time to list, but you can check it out at SourceForge or just google it.

Related Question