Lum – How to display columns in tab separated files nicely

columnstext formatting

I have some Tab separated files that consist of a header that are quite unreadable because of the different length of table entries. Essentially, it looks somewhat like

c1    c2    c3    c4
A    0    1.0231321321213    92
BBBBB    12321.00002131    19912132.    0
CC    0.0999813221321    0    0

Is there a way to make this more readable with columns spaced wider and the columns nice aligned to from readable columns like

c1       c2                 c3                 c4
A        0                  1.0231321321213    92
BBBBB    12321.00002131     19912132.          0
CC       0.0999813221321    0                  0

Best Answer

If the input columns are separated with simple blank space (ASCII space 0x20 and/or tab 0x09) and no blank column, it is as simple as:

<infile column -t
c1     c2               c3               c4
ABC    0                1.0231321321213  92
BBBBB  12321.00002131   19912132.        0
CC     0.0999813221321  0                0

However column does not process Carriage Return (ASCII 0x0d or \r), Form Feed (ASCII 0x0c or \f) or Vertical Tab (ASCII 0x0b or \v) as delimiters.
If the columns may be separated with some "whitespace" ([[:space:]] similar to [ \t\r\f\v]: space, horizontal tab, carriage return, from feed or vertical tab (not newline)) you will need to collapse (and convert) all white space to only one delimiter (space by default). It is not possible to use the newline character both as a line delimiter and as a column delimiter.

Except for the newline character, this work:

<infile sed 's/[[:space:]]\+/ /g' | column -t

It is possible to reduce the whitespace delimiters inside the […] range.

If the columns in the source file are separated with a single character (like tabs) it is possible to use shell ANSI C expansion ($'…') (if the running shell has such capability) to declare the character used as delimiter.
Then, using column:

<infile column -s $'\t' -t

The output delimiter for column is always an space.

If it is needed to accept several consecutive delimiters (useful when there is blank column), there is the (GNU) option -n that disable merging multiple input adjacent delimiters into a single delimiter.

<infile column -s $'\t' -tn

if the source file are separated with not a single character but multiple characters, you can still define those within $'...' without extra usage of sed to converting them to single character; like Space or Tabs with:

<infile column -s $'\t ' -tn

Related Solutions

Lum – awk – dynamically format tab-separated columns

in bash, using column

$ column -s $'\t' -t file.tsv
col1       col2 col2 col2  col3 col3  col4
col1       col2 col2       col3       col4 col4
col1 col1  col2 col2       col3       col4 col4 col4

column -t uses 2 spaces to separate the columns

With awk, I'd write

awk -F '\t' -v cols=4 '
    NR == FNR {
        for (i=1; i<=cols; i++) 
            if (NR == 1 || length($i) > w[i]) 
                w[i] = length($i)
        next
    }
    {
        for (i=1; i<=cols; i++) 
            printf "%-*s%s", w[i], $i, (i == cols ? ORS : FS) 
    }
' file.tsv file.tsv

Where I"m processing the file twice: first to find the max wideths for each column, then again to reformat the file. I use a tab to separate the columns in the output.

col1            col2 col2 col2  col3 col3       col4
col1            col2 col2       col3            col4 col4
col1 col1       col2 col2       col3            col4 col4 col4

Best Answer

Related Solutions

Lum – awk – dynamically format tab-separated columns

Related Question