Take two columns in a tab delimited file and merge into one

command linetext processing

I was wondering how I would take data that was in this format as a tab-delimited file:

A  red     green  
B  yellow  orange  
C  blue    purple

And to use commands like grep, paste, cut, cat, etc. to turn it into the following:

A red
B yellow
C Blue
A green
B orange
C purple

Best Answer

Similar to cut , you can also do it with awk:

$ awk '{print $1,$2}' aa.txt && awk '{print $1,$3}' aa.txt
A red
B yellow
C blue
A green
B orange
C purple
# OR to send the output in a new file:
$ (awk '{print $1,$2}' aa.txt && awk '{print $1,$3}' aa.txt) >aaa.txt

The difference is that awk handles better the white space than cut. This is useful if fields in each line are separated with more than one space.

For example if the file line is A red = one space separated, then cut solution as advised can do it also successfully, but if the line is A red = 3 spaces , then cut will fail, while awk will succeed to get fields 1 and 2 or fields 1 and 3.

Update:
As advised in comments (thanks don_crissti) this can also be done in pure awk:

awk 'BEGIN{FS=OFS=" "}{z[NR]=$1FS$3; print $1,$2}END{for (i=1; i<=NR; i++){print z[i]}}' a.txt

Explanation:

FS           : Input Field Separator
OFS          : Output Field Separator
FS=OFS=" "   : input & output field separator is set to "space"
z[NR]        : Creating an array with name 'z' and index the record number: 
             z[1] for first line, z[2] for second line , z[3] for third line
z[NR]=$1FS$3 : to each array element assign field1-FieldSeparator FS=space)-field2
So for first line the fields1=A and Fields 3=green will be stored in z[1] => equals to z[1]="A green"

print $1,$2  : Justs prints on screen 1stfield (A) and 2ndfield (red) of the current line, printed separated by OFS

When the file is finished (END) then with a for loop we print out the whole z array entries => print z[i]
For i=1 => print z[1] => prints "A green"
For i=2 => print z[2] => prints "B orange"
For i=3 => print z[3] => prints "C purple"

PS: If fields are not separated by space but by tab , then Begin section of this awk one-liner must be changed to `awk 'BEGIN {FS=OFS="\t"}....`

Related Solutions

Shell – How to expand tabs based on content

If you have column(1), an old BSD tool, try column -t, for pretty-printing tables.

To ensure empty cells are displayed, you could try the approach of inserting a single space in each empty cell (recognizable by two consecutive tabs). The idea is column(1) should give the space character its own column but being a single character in width it should not affect the table dimensions or be visible in the output to humans.

generate_tsv | 
   awk '/\t\t/ { for (i = 0; i < 2; i++) gsub(/\t\t/, "\t \t") } 1' | 
   column -t -s $'\t'

The extra awk inserted in the pipeline does the inserting of spaces into each empty cell, as described. 2 passes are necessary to handle 2 consecutive empty cells (\t\t\t).

Shell Script – Cut Tab-Delimited Text File Lines to 80 Characters

I think you're looking for expand and/or unexpand. It seems you're trying to ensure a \tab width counts as 8 chars rather than the single one. fold will do that as well, but it will wrap its input to the next line rather than truncating it. I think you want:

expand < input | cut -c -80

expand and unexpand are both POSIX specified:

The expand utility shall write files or the standard input to the standard output with \tab characters replaced with one or more space characters needed to pad to the next tab stop. Any backspace characters shall be copied to the output and cause the column position count for tab stop calculations to be decremented; the column position count shall not be decremented below zero.

Pretty simple. So, here's a look at what this does:

unset c i; set --;                                                             
until [ "$((i+=1))" -gt 10 ]; do set -- "$@" "$i" "$i"; done                      
for c in 'tr \\t \ ' expand;  do eval '                                           
    { printf "%*s\t" "$@"; echo; } | 
      tee /dev/fd/2 |'"$c"'| { 
      tee /dev/fd/3 | wc -c >&2; } 3>&1 |
      tee /dev/fd/2 | cut -c -80'
done

The until loop at top gets a set of data like...

1 1 2 2 3 3 ...

It printfs this with the %*s arg padding flag so for each of those in the set printf will pad with as many spaces as are in the number of the argument. To each one it appends a \tab character.

All of the tees are used to show the effects of each filter as it is applied.

And the effects are these:

1        2        3        4        5        6        7        8                9               10
1  2   3    4     5      6       7        8         9         10 
1  2   3    4     5      6       7        8         9         10 
66
1        2        3        4        5        6        7        8                9               10
1        2        3        4        5        6        7        8                9               10 
1        2        3        4        5        6        7        8                
105

Those rows are lined up in two sets like...

output of printf ...; echo
output of tr ... or expand
output of cut
output of wc

The top four rows are the results of the tr filter - in which each \tab is converted to a single space.

And the bottom four the results of the expand chain.

Best Answer

Related Solutions

Shell – How to expand tabs based on content

Shell Script – Cut Tab-Delimited Text File Lines to 80 Characters

Related Question