Shell – How to add words to an existing column

shell-scripttext processing

I have a tab delimited file with 10 columns and in one of the columns (with about 40 million rows), I would like to add a word before the existing entry in each row (same word in each row!) and a ; after the entry.

e.g. two rows before

1 2 3 4 5 6 7 8 text still more text in this column 10
1 2 3 4 5 6 7 8 text2 still more text in this column 10

to

1 2 3 4 5 6 7 8 test=text; still more text in this column 10
1 2 3 4 5 6 7 8 test=text2; still more text in this column 10

At the end of the day it's the basic "concatenate" function in excel, but I can't use excel for such large files and also need to move to linux anyways.

I looked into concatenate questions here on the forum but I only found topics dealing with merging two strings, eg.

foo="Hello"
foo="$foo World"
echo $foo  

but not using variables.

Best Answer

This is exactly what awk is good at:

awk -F'\t' -vOFS='\t' '{ $9 = "test=" $9 ";" }1'

-F'\t' tells it to use tab-separated fields. -vOFS='\t' tells it to use tabs in the output too. The actual body of it is the last argument: it's a little program that says for every line to change the value of $9 (the ninth field) to the concatenation of "test=", its original value, and ";". We leave all the other fields intact, The final 1 is to tell awk to print the new line out even though we did something to it.

If I give it your input (with tabs inserted):

$ cat data
1   2   3   4   5   6   7   8   text    still more text in this column 10
1   2   3   4   5   6   7   8   text2   still more text in this column 10

then I can run the above command:

$ awk -F'\t' -vOFS='\t' '{ $9="test=" $9 ";" }1' < data
1   2   3   4   5   6   7   8   test=text;  still more text in this column 10
1   2   3   4   5   6   7   8   test=text2; still more text in this column 10

and get the output you want. You can save that to a file with > redirection:

$ awk -F'\t' -vOFS='\t' '{ $9="test=" $9 ";" }1' < data > processed-data
Related Question