Ubuntu – How to organize the disordered columns using awk/sed

awkcommand linesed

I have 2 columns including some data like:

one      two
one two
one    two

How can we convert it to:

one two
one two
one two

Awk - re-write the fields with the default (single space) output field separator:

$ awk '{NF+=0} 1' data
one two
one two
one two

Sed - substitute multiple spaces with single space:

$ sed 's/  */ /' data
one two
one two
one two

tr - squeeze (-s) spaces:

$ tr -s ' ' < data
one two
one two
one two

column:

$ column -t < data
one  two
one  two
one  two

rs (reshape) to two columns:

$ rs 0 2 < data
one  two
one  two
one  two

A GNU awk solution using two-dimensional arrays:

gawk -F $'\t' '{a[$1][$3]++} END {for (i in a) for (j in a[i]) print i, j, a[i][j]}' foo.txt

a[$1][$3]++ for each combination of first name and surname, increment the count
Then loop through the first names and the company names associated with each first name.

Another way that will work other awks using the older form of multidimensional arrays:

awk -F $'\t' '{a[$1, $3]++} END{for (i in a) {split (i, sep, SUBSEP); print sep[1], sep[2], a[i]}}' foo.txt

Since the old method actually uses a concatenation of the indices separated by SUBSEP, we have to split on SUBSEP to get back the original indices.

You can do this by combining the column-values in the hash key, e.g. assuming your input is sorted, this one-pass solution works for column 1-3:

awk '!h[$1,$2,$3]++ { NF--; print }' FS=, OFS=, data.csv

Output:

Col1,Col2,Col3
A,10,50
A,10,05
B,20,30
B,20,03
C,30,100
C,30,111
C,40,111
C,30,123

For columns 1 and 4, do something like this:

awk '!h[$1,$4]++ { print $1, $4 }' FS=, OFS=, data.csv

Output:

Col1,Col4
A,2017
B,2017
C,2017
C,2016
C,2015