I have a tab delimited column text like below
A B1 B1 C1
B B2 D2
C C12 C13 C13
D D3 D5 D9
G F2 F2
how could I convert the above table like below
A B1 C1
B B2 D2
C C12 C13
D D3 D5 D9
G F2
I have extracted my real data file, it is a tab delimited file and I have tried the command line you (Stéphane Chazelas?) posted it works fine but it couldn't remove the duplicate on the last column
A CD274 PDCD1LG2 CD276 PDCD1LG2 CD274
B NEK2 NEK6 NEK10 NEK10 NEKL-4
C TNFAIP3 OTUD7B OTUD7B TNFAIP3 TNFAIP3
D DUSP16 DUSP4 DUSP8 VHP-1 DUSP8
E AGO2 AGO2 AGO2 AGO2 AGO2
output need to be as below
A CD274 CD276 PDCD1LG2
B NEK2 NEK6 NEK10 NEKL-4
C TNFAIP3 OTUD7B
D DUSP16 DUSP4 DUSP8 VHP-1
E AGO2
Best Answer
First set of example data:
Second set of example data (same
awk
script):The script reads the input file
file
line by line, and for each line it goes through each field, building up the output line,r
. If the value in a field has already been added to the output line (determined by a lookup table,t
, of used field values), then the field is ignored, otherwise it's added.When all the fields of an input line have been processed, the constructed line is outputted.
The output field delimiter is set to tab through
-vOFS='\t'
on the command line.The
awk
script unravelled: