Bash – command substitution inside awk

awkbash

Is there a way to perform command substitution inside AWK and be able to reference the fields inside the substituted command using the $n notation of AWK?

E.g.

find | awk '/txt$/ {nl = $(wc -l $NF); print nl}'

I was hoping that the above would print the number of lines in each .txt file.
Instead, it effectively returns the same output as:

find | awk '/txt$/ {print}'

Q1: is there a way to perform command substitution inside awk?

Q2: why is the first incantation above silently failing and is simply printing the filenames instead?

Please note the above is offered as an example only. I am not asking how to print the number of lines of each file by some other means. E.g by for f in $(find -iname \*.txt); do wc -l $f; done

The question is specifically about how to leverage command substitution in AWK programs.

Best Answer

First, a disclaimer: Please don't parse the output of find. The code below is for illustration only, of how to incorporate command substitution into an Awk script in such a way that the commands can act upon pieces of Awk's input.

To actually do a line count (wc -l) on each file found with find (which is the example use case), just use:

 find . -type f -name '*txt' -exec wc -l {} +

However, to answer your questions as asked:

Q1

To answer your Q1:

Q1: is there a way to perform command substitution inside awk?

Of course there is a way, from man awk :

command | getline [var] Run command piping the output either into $0 or var, as above, and RT.

So ( Watch the quoting !! ):

find . | awk '/txt$/{"wc -l <\"" $NF "\"|cut -f1" | getline(nl); print(nl)}'

Please note that the string built and therefore the command executed is

wc -l <file

To avoid the filename printing of wc.

Well, I avoided a needed file "close" for that command (safe for a couple of files, but technically incorrect). You actually need to do:

find . | awk '/txt$/{
                       comm="wc -l <\"" $NF "\" | cut -f1"
                       comm | getline nl;
                       close (comm);
                       print nl 
                    }'

That works for older awk versions also.
Remember to avoid the printing of a dot . with find ., that makes the code fail as a dot is a directory and wc can not use that.

Or either, avoid the use of dot values:

find . | awk '/txt$/ && $NF!="." {  comm="wc -l <\"" $NF "\" | cut -f1"
                                    comm | getline nl;
                                    close (comm);
                                    print nl 
                                 }'

You can convert that to a one-liner, but it will look quite ugly, Me thinks.

Q2

As for your second question:

Q2: why is the first incantation above silently failing and is simply printing the filenames instead?

Because awk does not parse correctly shell commands. It understand the command as:

nl = $(wc -l $NF)
nl --> variable
$ --> pointer to a field
wc --> variable (that has zero value here)
-  --> minus sign
l  --> variable (that has a null string)
$  --> Pointer to a field
NF --> Last field

Then, l $NF becomes the concatenation of null and the text inside the las field (a name of a file). The expansion of such text as a numeric variable is the numeric value 0

For awk, it becomes:

nl = $( wc -l $NF)
nl = $ ( 0 - 0 )

Which becomes just $0, the whole line input, which is (for the simple find of above) only the file name.

So, all the above will only print the filename (well, technically, the whole line).

Related Question