Print columns in awk by header name

awktext processing

I have a text file like so

foo bar baz
1   a   alpha
2   b   beta
3   c   gamma

I can use awk to print certain columns, like 1 and 3, with {print $1, $3}, but I want to specify the columns to print by specifying the header of the column instead, something like {print $foo, $baz}. This is useful so I don't have to open the file and count the columns manually to see which column is which, and I don't have to update the script if the column number or order changes. Can I do this with awk (or another shell tool)?

Best Answer

awk '
NR==1 {
    for (i=1; i<=NF; i++) {
        f[$i] = i
    }
}
{ print $(f["foo"]), $(f["baz"]) }
' file
foo baz
1 alpha
2 beta
3 gamma

That is an immensely useful idiom. I have a lot of data in spreadsheets and different spreadsheets might have a common subset of columns I'm interested in but not necessarily in the same order across all spreadsheets or with the same numbers of other columns before/between them so being able to export them as CSV or similar and then simply run an awk script using the column names instead of column numbers is absolutely invaluable.

Related Question