I am struggled myself to make a case sensitive replacement in a text file. Please find below a segment of my sed file that I am running as
sed -f file.sed < input.txt > output.txt
s/\<code_229633_13\>/R77_08349T0/
s/\<code_229633_138\>/R77_09738T0/
s/\<code_230519_10\>/R77_04813T0/
s/\<code_230519_1\>/R77_13591T0/
s/\<code_230519_13\>/R77_05463T0/
up to line 14521....
The code is working great but I have also cases where I have 2 or more TARGET ids (code_010512_23 and code_299097_0) ovelapping the same REPLACEMENT id (R77_14520T0) and I would like to have as output something like R77_14520T0.a and R77_14520T0.b (lines 1 and 2 below)
s/code_010512_23/R77_14520T0/ --> R77_14520T0.a
s/code_299097_0/R77_14520T0/ --> R77_14520T0.b
Furthermore, a more complex but similar case is when i have the following input file (input2.txt file):
ID=gene09464;Name=code_229633_13;isoforms=1
ID=mRNA10661;Parent=gene09464;Name=code_229633_13
ID=exon26192;Parent=mRNA10661;Name=code_229633_13;Target=R77_08349T0 1 1093 +
ID=exon26193;Parent=mRNA10661;Name=code_229633_13;Target=R77_08349T0 1094 1873 +
ID=gene09491;Name=code_229633_138;isoforms=1
ID=mRNA10690;Parent=gene09491;Name=code_229633_138
ID=exon26252;Parent=mRNA10690;Name=code_229633_138;Target=R77_09738T0 1 411 +
ID=gene09513;Name=code_230519_10;isoforms=1
ID=mRNA10715;Parent=gene09513;Name=code_230519_10
ID=exon26311;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 1 59 +
ID=exon26312;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 60 186 +
ID=gene09511;Name=code_230519_1;isoforms=1
ID=mRNA10713;Parent=gene09511;Name=code_230519_1
ID=exon26308;Parent=mRNA10713;Name=code_230519_1;Target=R77_13591T0 1 1075 +
ID=exon26309;Parent=mRNA10713;Name=code_230519_1;Target=R77_13591T0 1076 1128 +
ID=gene09514;Name=code_230519_13;isoforms=1
ID=mRNA10716;Parent=gene09514;Name=code_230519_13
ID=exon26316;Parent=mRNA10716;Name=code_230519_13;Target=R77_05463T0 1 219 +
ID=gene00865;Name=code_010512_23;isoforms=1
ID=mRNA00979;Parent=gene00865;Name=code_010512_23
ID=exon02477;Parent=mRNA00979;Name=code_010512_23;Target=R77_14520T0 1 143 +
ID=gene14561;Name=code_299097_0;isoforms=2
ID=mRNA16419;Parent=gene14561;Name=code_299097_0
ID=exon39828;Parent=mRNA16419;Name=code_299097_0;Target=R77_14520T0 144 193 +
ID=mRNA16420;Parent=gene14561;Name=code_299097_0
ID=exon39828;Parent=mRNA16420;Name=code_299097_0;Target=R77_15554T0 408 457 +
and I need to apply the replacements with the same as the previous way only on the lines which contain the word "isoforms", in other words in lines 1,6,10, 15,20, 24 and 28 and nowhere else in the text. The format of this input file would be exactly as depicted with blank lines among the "isoforms" lines.
My desired output
ID=gene09464;Name=R77_08349T0;isoforms=1
ID=mRNA10661;Parent=gene09464;Name=code_229633_13
ID=exon26192;Parent=mRNA10661;Name=code_229633_13;Target=R77_08349T0 1 1093 +
ID=exon26193;Parent=mRNA10661;Name=code_229633_13;Target=R77_08349T0 1094 1873 +
ID=exon26194;Parent=mRNA10661;Name=code_229633_13;Target=R77_08349T0 1874 4065 +
ID=gene09491;Name=R77_09738T0;isoforms=1
ID=mRNA10690;Parent=gene09491;Name=code_229633_138
ID=exon26252;Parent=mRNA10690;Name=code_229633_138;Target=R77_09738T0 1 411 +
ID=gene09513;Name=Target=R77_04813T0;isoforms=1
ID=mRNA10715;Parent=gene09513;Name=code_230519_10
ID=exon26311;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 1 59 +
ID=exon26312;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 60 186 +
ID=exon26313;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 187 678 +
ID=exon26314;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 679 1399 +
ID=exon26315;Parent=mRNA10715;Name=code_230519_10;Target=R77_04813T0 1400 1402 +
ID=gene09511;Name=R77_13591T0;isoforms=1
ID=mRNA10713;Parent=gene09511;Name=code_230519_1
ID=exon26308;Parent=mRNA10713;Name=code_230519_1;Target=R77_13591T0 1 1075 +
ID=exon26309;Parent=mRNA10713;Name=code_230519_1;Target=R77_13591T0 1076 1128 +
ID=gene09514;Name=R77_05463T0;isoforms=1
ID=mRNA10716;Parent=gene09514;Name=code_230519_13
ID=exon26316;Parent=mRNA10716;Name=code_230519_13;Target=R77_05463T0 1 219 +
ID=gene00865;Name=R77_14520T0.a;isoforms=1
ID=mRNA00979;Parent=gene00865;Name=code_010512_23
ID=exon02477;Parent=mRNA00979;Name=code_010512_23;Target=R77_14520T0 1 143 +
ID=gene14561;Name=R77_14520T0.b;isoforms=2
ID=mRNA16419;Parent=gene14561;Name=code_299097_0
ID=exon39828;Parent=mRNA16419;Name=code_299097_0;Target=R77_14520T0 144 193 +
ID=mRNA16420;Parent=gene14561;Name=code_299097_0
ID=exon39828;Parent=mRNA16420;Name=code_299097_0;Target=R77_15554T0 408 457 +
Best Answer
You can't really do this kind of thing with
sed
, it's just a text stream editor. Try this Perl scriptlet:Save the script above as
foo.pl
, make it executable (chmod a+x foo.pl
) and run on your input file: