Transforming short repeated words in columns into numbers

terminaltext processing

I want to transform the short repeated words in columns into numbers.

In the following example I want to change the words (with ONLY 2 LETTERS) in column 3 for numbers, so that AA is changed to 2, AB or BA into 1, BB into 0.

The first and second column may also contain AA, BB, AB and BA. These should not be changed.

Columns are separated by " " ().

Id_animal Id_SNP Allele
ID01 rs01 AB
ID02 rs01 BA
ID03 rs01 AA
ID04 rs01 BB

The wanted output is:

Id_animal Id_SNP Allele
ID01 rs01 1
ID02 rs01 1
ID03 rs01 2
ID04 rs01 0

Best Answer

sed -i.bak -r 's/ AA$/ 2/;s/ (AB|BA)$/ 1/;s/ BB$/ 0/' input
  • -i.bak in place editing and create a backup of original file as input.bak
  • -r extended regex syntax
  • s/ AA$/ 2/ replace ending character sequence of ' AA' with 2
  • (AB|BA) either AB or BA
  • ; separates the different substitute operations
Related Question