CSV – Visually Align Columns in File

columnscsvtext formatting

Using sed or awk, is it possible to visually align columns in a CSV file?

For example:

e.g. from:

a,b,c,some stuff,"some, other, stuff",d,2023-03-10 18:37:00
y,x,z,t,cool,thing,2022-04-12 21:44:00

to:

a, b, c, some stuff,"some, other, stuff",     d, 2023-03-10 18:37:00<EOL>
x, y, z,          t,                cool, thing, 2022-04-12 21:44:00<EOL>

There are some double quoted fields containing text and having commas.

I gave column from bsdmainutils a try but it's apparently not able to deal with such data.

Best Answer

A CSV file of this type:

a, b, c, some stuff,"some, other, stuff",     d, 2023-03-10 18:37:00<EOL>
x, y, z,          t,                cool, thing, 2022-04-12 21:44:00<EOL>

is not really the same data file anymore since you are modifying the fields. When parsed, what was originally "t" will now parse out as " t" because of the width of "some stuff" above (unless you use a regex to parse the non-standard ,[variable space] delimiters.)

You can force quotes on all fields to get a more csv file that shows these new fields clearly. Here is a Ruby to do that:

ruby -r csv -e '
cols={}
data=CSV.parse($<.read)
data.transpose.each_with_index{|sa,i| 
    cols[i]=sa.max_by{|e| e.length}; cols[i]=cols[i].length 
}
puts CSV.generate(force_quotes:true){|csv|
    data.each{|row|
        csv<<row.map.with_index{|e, i| e.rjust(cols[i] ) }
    }
}
' file

Prints:

"a","b","c","some stuff","some, other, stuff","    d","2023-03-10 18:37:00"
"y","x","z","         t","              cool","thing","2022-04-12 21:44:00"

Or, if you really want quoted and unquoted fields, you can do:

ruby -r csv -e '
lcl_csv_opt={:row_sep=>nil}
data=CSV.parse($<.read)
cols=data.transpose.map.with_index{|sa,i| 
    x=sa.max_by{|e| [e].to_csv(**lcl_csv_opt).length}
    [i,"#{[x].to_csv(**lcl_csv_opt)}"]
}.to_h
puts CSV.generate(){|csv|
    data.each{|row|
        csv<<row.map.with_index{|e, i| 
            [e].to_csv(**lcl_csv_opt)==cols[i] ? e : e.rjust(cols[i].length ) 
        }
    }
}
' file

Prints:

a,b,c,some stuff,"some, other, stuff",    d,2023-03-10 18:37:00
y,x,z,         t,                cool,thing,2022-04-12 21:44:00

Which also handles nasty escaped quotes within fields. Given:

$ cat file
a,b,c,some stuff,"some, other, stuff",d,2023-03-10 18:37:00
y,x,z,t,cool,"""thing"", quoted",2022-04-12 21:44:00

The second version prints:

a,b,c,some stuff,"some, other, stuff",                  d,2023-03-10 18:37:00
y,x,z,         t,                cool,"""thing"", quoted",2022-04-12 21:44:00
Related Question