Lum – How to display columns in tab separated files nicely

columnstext formatting

I have some Tab separated files that consist of a header that are quite unreadable because of the different length of table entries. Essentially, it looks somewhat like

c1    c2    c3    c4
A    0    1.0231321321213    92
BBBBB    12321.00002131    19912132.    0
CC    0.0999813221321    0    0

Is there a way to make this more readable with columns spaced wider and the columns nice aligned to from readable columns like

c1       c2                 c3                 c4
A        0                  1.0231321321213    92
BBBBB    12321.00002131     19912132.          0
CC       0.0999813221321    0                  0

Best Answer

If the input columns are separated with simple blank space (ASCII space 0x20 and/or tab 0x09) and no blank column, it is as simple as:

<infile column -t
c1     c2               c3               c4
ABC    0                1.0231321321213  92
BBBBB  12321.00002131   19912132.        0
CC     0.0999813221321  0                0

However column does not process Carriage Return (ASCII 0x0d or \r), Form Feed (ASCII 0x0c or \f) or Vertical Tab (ASCII 0x0b or \v) as delimiters.
If the columns may be separated with some "whitespace" ([[:space:]] similar to [ \t\r\f\v]: space, horizontal tab, carriage return, from feed or vertical tab (not newline)) you will need to collapse (and convert) all white space to only one delimiter (space by default). It is not possible to use the newline character both as a line delimiter and as a column delimiter.

Except for the newline character, this work:

<infile sed 's/[[:space:]]\+/ /g' | column -t

It is possible to reduce the whitespace delimiters inside the […] range.

If the columns in the source file are separated with a single character (like tabs) it is possible to use shell ANSI C expansion ($'…') (if the running shell has such capability) to declare the character used as delimiter.
Then, using column:

<infile column -s $'\t' -t

The output delimiter for column is always an space.

If it is needed to accept several consecutive delimiters (useful when there is blank column), there is the (GNU) option -n that disable merging multiple input adjacent delimiters into a single delimiter.

<infile column -s $'\t' -tn

if the source file are separated with not a single character but multiple characters, you can still define those within $'...' without extra usage of sed to converting them to single character; like Space or Tabs with:

<infile column -s $'\t ' -tn
Related Question