A GNU awk solution using two-dimensional arrays:
gawk -F $'\t' '{a[$1][$3]++} END {for (i in a) for (j in a[i]) print i, j, a[i][j]}' foo.txt
a[$1][$3]++
for each combination of first name and surname, increment the count
- Then loop through the first names and the company names associated with each first name.
Another way that will work other awk
s using the older form of multidimensional arrays:
awk -F $'\t' '{a[$1, $3]++} END{for (i in a) {split (i, sep, SUBSEP); print sep[1], sep[2], a[i]}}' foo.txt
- Since the old method actually uses a concatenation of the indices separated by
SUBSEP
, we have to split on SUBSEP
to get back the original indices.
You can do this by combining the column-values in the hash key, e.g. assuming your input is sorted, this one-pass solution works for column 1-3:
awk '!h[$1,$2,$3]++ { NF--; print }' FS=, OFS=, data.csv
Output:
Col1,Col2,Col3
A,10,50
A,10,05
B,20,30
B,20,03
C,30,100
C,30,111
C,40,111
C,30,123
For columns 1 and 4, do something like this:
awk '!h[$1,$4]++ { print $1, $4 }' FS=, OFS=, data.csv
Output:
Col1,Col4
A,2017
B,2017
C,2017
C,2016
C,2015
Best Answer
Awk - re-write the fields with the default (single space) output field separator:
Sed - substitute multiple spaces with single space:
tr - squeeze (
-s
) spaces:column:
rs (reshape) to two columns: