Print columns in awk by header name

awktext processing

I have a text file like so

foo bar baz
1   a   alpha
2   b   beta
3   c   gamma

I can use awk to print certain columns, like 1 and 3, with {print $1, $3}, but I want to specify the columns to print by specifying the header of the column instead, something like {print $foo, $baz}. This is useful so I don't have to open the file and count the columns manually to see which column is which, and I don't have to update the script if the column number or order changes. Can I do this with awk (or another shell tool)?

Best Answer

awk '
NR==1 {
    for (i=1; i<=NF; i++) {
        f[$i] = i
    }
}
{ print $(f["foo"]), $(f["baz"]) }
' file
foo baz
1 alpha
2 beta
3 gamma

That is an immensely useful idiom. I have a lot of data in spreadsheets and different spreadsheets might have a common subset of columns I'm interested in but not necessarily in the same order across all spreadsheets or with the same numbers of other columns before/between them so being able to export them as CSV or similar and then simply run an awk script using the column names instead of column numbers is absolutely invaluable.

Related Solutions

Print columns that start with a specific string

With awk:

awk '{for(i=5;i<=NF;i++){if($i~/^ANC=/){a=$i}} print $1,$2,$3,$4,a}' file

for(...) loops through all fields, starting with field 5 (i=5).
- if($i~/^ANC=/) checks if the field starts with ANC=
- a=$i if yes, set variable a to that value
print $1,$2,$3,$4,a print fields 1-4 followed by whatever is stored in a.

Can be combined with BEGIN {OFS="\t"} of course.

How to print a specific column condition using awk

Just replace print with printf to print the value on the same line, and insert new line after finishing the iteration on the the line:

awk '{{for(i=1;i<=NF;i++)if($i == "name:") printf $(i+1)" "$(i+2)" "} print ""; }' yourfile

The output:

1 AAA 
2 AAA 3 BBB 
1 AAA 2 BBB 5 CCC

Best Answer

Related Solutions

Print columns that start with a specific string

How to print a specific column condition using awk

Related Question