Shell – How to expand tabs based on content

pipeshelltext processing

I've got some tab-delimited data coming out of a Unix pipe. I'd like to format this data into a compact human-readable table.

How can I expand these tabs into spaces, and automatically set the tab stops based on the width of the field data?

For example, my hypothetical generate_tsv script might produce (I'm using \t to represent actual tab characters) —

alpha\t1\t0.21085026\tok
beta\t4096\t0.0\tok
gamma\t\t-1.0\tinvalid

Using | expand -t 12, I'd get:

alpha       1           0.21085026  ok
beta        4096        0.0         ok
gamma                   -1.0        invalid

But I'd like something more compact (so there are exactly two spaces separating columns), like this:

alpha  1     0.21085026  ok
beta   4096  0.0         ok
gamma        -1.0        invalid

As suggested by @jw013, | column -t -s $'\t' is close, but not quite correct, as it collapses the empty cell:

alpha  1     0.21085026  ok
beta   4096  0.0         ok
gamma  -1.0  invalid

Best Answer

If you have column(1), an old BSD tool, try column -t, for pretty-printing tables.

To ensure empty cells are displayed, you could try the approach of inserting a single space in each empty cell (recognizable by two consecutive tabs). The idea is column(1) should give the space character its own column but being a single character in width it should not affect the table dimensions or be visible in the output to humans.

generate_tsv | 
   awk '/\t\t/ { for (i = 0; i < 2; i++) gsub(/\t\t/, "\t \t") } 1' | 
   column -t -s $'\t'

The extra awk inserted in the pipeline does the inserting of spaces into each empty cell, as described. 2 passes are necessary to handle 2 consecutive empty cells (\t\t\t).

Related Question