Lum – awk – dynamically format tab-separated columns

awkcolumnscsvgawk

I have a file with dynamic length columns (four) separated with tabs (a column can have have spaces)

COL1    COL2 COL2 COL2  COL3 COL3       COL4
COL1    COL2 COL2       COL3    COL4 COL4
COL1 COL1       COL2 COL2       COL3    COL4 COL4 COL4

I 'd like to format it dynamically with printf in awk? I can format it with fixed adjustments:

$ awk 'BEGIN {FS="\t"}; {printf "%-10s %-10s %-15s %-15s\n", $1,$3,$4,$2}' test
COL1       COL3 COL3  COL4            COL2 COL2 COL2
COL1       COL3       COL4 COL4       COL2 COL2
COL1 COL1  COL3       COL4 COL4 COL4  COL2 COL2

Best Answer

in bash, using column

$ column -s $'\t' -t file.tsv
col1       col2 col2 col2  col3 col3  col4
col1       col2 col2       col3       col4 col4
col1 col1  col2 col2       col3       col4 col4 col4

column -t uses 2 spaces to separate the columns


With awk, I'd write

awk -F '\t' -v cols=4 '
    NR == FNR {
        for (i=1; i<=cols; i++) 
            if (NR == 1 || length($i) > w[i]) 
                w[i] = length($i)
        next
    }
    {
        for (i=1; i<=cols; i++) 
            printf "%-*s%s", w[i], $i, (i == cols ? ORS : FS) 
    }
' file.tsv file.tsv

Where I"m processing the file twice: first to find the max wideths for each column, then again to reformat the file. I use a tab to separate the columns in the output.

col1            col2 col2 col2  col3 col3       col4
col1            col2 col2       col3            col4 col4
col1 col1       col2 col2       col3            col4 col4 col4
Related Question