Awk: Extracting a fixed number of rows where the last row number may vary

awktext processing

This question may look like a duplicate, but only at first glance.
Of course, I would no longer need help in how to code a one-liner that extracts a fixed number of continuous lines (e. g. 5 in this example) from a data source, e. g. top:

$ top -b -n1 | awk 'BEGIN {printf "%23s %7s\n","cpu","mem"} NR==8,NR==12 {printf "%-16s %6s%% %6s%%\n",$12,$9,$10}'

This is even a very handy one-liner that will show the processes in the system that take most CPU, with the memory usage being printed in an additional column.

So far, so good … however, it's not that trivial. To get this list, top is necessary and may (on low system load) show up itself as process in this list. I'd rather not want that, since these calls are done in intervals and would regularly spawn top (if only for a short while).
It is known that we want to begin at line 8 (NR==8). However, what if a second top in another virtual desktop was forgotten about in a terminal which messes up the list as well? In this case, two top processes must be omitted, so the last line to process will be 14.

So to improve this output and to filter out every top line that is in there, a counter seems mandatory (perhaps a for loop that we exit with a break?).
Unfortunately my attempts with a for loop and i = <number> have been fruitless so far, because it would rather print every line as many times as i indicates.

I've come up with a rather hackish solution, which works but may be unsuitable for more complex cases:

top -b -n1 | grep -v ' \btop\b$' | awk 'BEGIN {printf "%23s %7s\n","cpu","mem"} NR==8,NR==12 {printf "%-16s %6s%% %6s%%\n",$12,$9,$10}'

(Note: This may give unwanted results if the user name in the second column happens to be "top" as well)

Anyways, could I get a clue how to do that in awk please (and get rid of the grep)?
Thanks in advance.

Best Answer

This is not answering your question,
but it solves the problem you are trying to solve, in a completely different way:

The full command is this (see below for example output):

ps -o comm,%cpu,%mem --sort -%cpu -A | head -6

I will describe the parts of it:

  • using ps to have more control about the output
  • Printing only the three columns we need with -o comm,%cpu,%mem
  • Make ps sort the data internally --sort -%cpu, by CPU, reverse.
  • List all processes with -A
  • Show the header and the first 5 lines of the result by | head -6

The output is similar to the output of your first command:

$ ps -o comm,%cpu,%mem --sort -%cpu -A | head -6
COMMAND         %CPU %MEM
firefox          8.9 15.5
Xorg             1.3  5.6
parcellite       0.3  1.6
compiz           0.2  1.8
konsole          0.1  0.9

The process ps is listed in the full list - one could exclude it based on the parent PID.

If we want to exclude top processes elsewhere, we can do that based on the command name.

The -A selecting all processes would be replaced by -N ...:

ps ... -N --ppid $$ -C top

As we now need to exclude processes, we use -N to select all others processes than the ones we match.

To exclude ps, we use that it has the current interactive shell as parent process, so it will have the parent pid, PPID of the shell. The PID of the current shell is $$.
So --ppid $$ matches all child processes of the current shell, and we know that there will be only one, ps.

We want to also exclude the top processes that may run on other displays on the same machine. We do that by matching the command name with -C top.

The full command with excluding the ps process itself (and only this) and all top processes would be:

ps -o comm,%cpu,%mem --sort -%cpu -N --ppid $$ -C top | head -6
Related Question