Awk: Extracting a fixed number of rows where the last row number may vary

awktext processing

This question may look like a duplicate, but only at first glance.
Of course, I would no longer need help in how to code a one-liner that extracts a fixed number of continuous lines (e. g. 5 in this example) from a data source, e. g. top:

$ top -b -n1 | awk 'BEGIN {printf "%23s %7s\n","cpu","mem"} NR==8,NR==12 {printf "%-16s %6s%% %6s%%\n",$12,$9,$10}'

This is even a very handy one-liner that will show the processes in the system that take most CPU, with the memory usage being printed in an additional column.

So far, so good … however, it's not that trivial. To get this list, top is necessary and may (on low system load) show up itself as process in this list. I'd rather not want that, since these calls are done in intervals and would regularly spawn top (if only for a short while).
It is known that we want to begin at line 8 (NR==8). However, what if a second top in another virtual desktop was forgotten about in a terminal which messes up the list as well? In this case, two top processes must be omitted, so the last line to process will be 14.

So to improve this output and to filter out every top line that is in there, a counter seems mandatory (perhaps a for loop that we exit with a break?).
Unfortunately my attempts with a for loop and i = <number> have been fruitless so far, because it would rather print every line as many times as i indicates.

I've come up with a rather hackish solution, which works but may be unsuitable for more complex cases:

top -b -n1 | grep -v ' \btop\b$' | awk 'BEGIN {printf "%23s %7s\n","cpu","mem"} NR==8,NR==12 {printf "%-16s %6s%% %6s%%\n",$12,$9,$10}'

(Note: This may give unwanted results if the user name in the second column happens to be "top" as well)

Anyways, could I get a clue how to do that in awk please (and get rid of the grep)?
Thanks in advance.

Best Answer

This is not answering your question,
but it solves the problem you are trying to solve, in a completely different way:

The full command is this (see below for example output):

ps -o comm,%cpu,%mem --sort -%cpu -A | head -6

I will describe the parts of it:

using ps to have more control about the output
Printing only the three columns we need with -o comm,%cpu,%mem
Make ps sort the data internally --sort -%cpu, by CPU, reverse.
List all processes with -A
Show the header and the first 5 lines of the result by | head -6

The output is similar to the output of your first command:

$ ps -o comm,%cpu,%mem --sort -%cpu -A | head -6
COMMAND         %CPU %MEM
firefox          8.9 15.5
Xorg             1.3  5.6
parcellite       0.3  1.6
compiz           0.2  1.8
konsole          0.1  0.9

The process ps is listed in the full list - one could exclude it based on the parent PID.

If we want to exclude top processes elsewhere, we can do that based on the command name.

The -A selecting all processes would be replaced by -N ...:

ps ... -N --ppid $$ -C top

As we now need to exclude processes, we use -N to select all others processes than the ones we match.

To exclude ps, we use that it has the current interactive shell as parent process, so it will have the parent pid, PPID of the shell. The PID of the current shell is $$.
So --ppid $$ matches all child processes of the current shell, and we know that there will be only one, ps.

We want to also exclude the top processes that may run on other displays on the same machine. We do that by matching the command name with -C top.

The full command with excluding the ps process itself (and only this) and all top processes would be:

ps -o comm,%cpu,%mem --sort -%cpu -N --ppid $$ -C top | head -6

Related Solutions

How to show CPU time for processes via top without ‘root’ procs

Your ps command should work if you sort it properly. From man ps:

   --sort spec
          Specify sorting order.  Sorting syntax is
          [+|-]key[,[+|-]key[,...]].  Choose a multi-letter key from the
          STANDARD FORMAT SPECIFIERS section.  The "+" is optional since
          default direction is increasing numerical or lexicographic
          order.  Identical to k.  For example: ps jax --sort=uid,-ppid,
          +pid

I'm not sure which time you want to sort by but here are the relevant choices:

STANDARD FORMAT SPECIFIERS
   bsdtime     TIME      accumulated cpu time, user + system.  The display
                         format is usually "MMM:SS", but can be shifted to
                         the right if the process used more than 999
                         minutes of cpu time.

   cputime     TIME      cumulative CPU time, "[DD-]hh:mm:ss" format.
                         (alias time).
   etime       ELAPSED   elapsed time since the process was started, in
                         the form [[DD-]hh:]mm:ss.

   etimes      ELAPSED   elapsed time since the process was started, in
                         seconds.

I think from your question that you want cputime. If so, this should give you your desired output:

ps -eo pid,user,args,etime,time,%cpu --sort cputime | grep -v root

Awk from different lines

awk solution:

awk 'v && NR==n{ print $6,v > "result.txt" }/^!/{ v=$5; n=NR+1 }' file

<condition1> { <statement> ... }<condition2>{ <statement> ... } - conditions with respective statements will be evaluated consecutively
/^!/{ v=$5; n=NR+1 } - on encountering line starting with ! - capture the 5th field value $5 and plan the next line number NR+1 (assigning to variable n)
v && NR==n - if we have the 1st crucial number v and the current record number NR is the needed "next line number" n - print the values into file result.txt

The result.txt file contents:

188 -9744.24963670
140 -9744.30001681
155 -9744.33953891
164 -9744.36584201
154 -9744.37925372
153 -9744.39185493
160 -9744.39836617

Best Answer

Related Solutions

How to show CPU time for processes via top without ‘root’ procs

Awk from different lines

Related Question