I have a tab delimited file with 10 columns and in one of the columns (with about 40 million rows), I would like to add a word before the existing entry in each row (same word in each row!) and a ;
after the entry.
e.g. two rows before
1 2 3 4 5 6 7 8 text still more text in this column 10
1 2 3 4 5 6 7 8 text2 still more text in this column 10
to
1 2 3 4 5 6 7 8 test=text; still more text in this column 10
1 2 3 4 5 6 7 8 test=text2; still more text in this column 10
At the end of the day it's the basic "concatenate" function in excel, but I can't use excel for such large files and also need to move to linux anyways.
I looked into concatenate questions here on the forum but I only found topics dealing with merging two strings, eg.
foo="Hello"
foo="$foo World"
echo $foo
but not using variables.
Best Answer
This is exactly what
awk
is good at:-F'\t'
tells it to use tab-separated fields.-vOFS='\t'
tells it to use tabs in the output too. The actual body of it is the last argument: it's a little program that says for every line to change the value of$9
(the ninth field) to the concatenation of "test=", its original value, and ";". We leave all the other fields intact, The final1
is to tellawk
to print the new line out even though we did something to it.If I give it your input (with tabs inserted):
then I can run the above command:
and get the output you want. You can save that to a file with
>
redirection: