I have a file which contains a gene sequence such as:
ATGTGGATGGTGGGTTACAATGAAGGTGGTGAGTTCAACATGGCTGATTATCCATTCAGTGGAAGGAAACTAAGGCCTCTCATTCCAAGACCAGTCCCAGTCCCTACTACTTCTCCTAACAGCACTTCAACTATAACTCCTTCCTTAAACCGCATTCATGGTGGCAATGATTTATTTTCACAATATCATCACAATCTGCAGCAGCAAGCATCAGTAGGAGATCATAGCAAGAGATCAGAGTTGAATAATAATAATAATCCATCTGCAGCAGTTGTGGTGAGTTCAAGATGGAATCCAACACCAGAACAGTTAAGAGCACTGGAAGAATTGTATAGAAGAGGAACAAGAACACCTTCTGCTGAGCAAATCCAACAAATAACTGCCCAGCTTAGAAAATTTGGAAAAATTGAAGGCAAAAATGTTTTCTATTGGTTTCAGAATCACAAAGCCAGAGAAAGGCAAAAACGACGGCGTCAAATGGAATCAGCAGCTGCTGAGTTTGATTCTGCTATTGAAAAGAAAGACTTAGGCGCAAGTAGG
ACAGTGTTTGAAGTTGAACACACTAAAAACTGGCTACCATCTACAAATTCCAGTACCAGTACTCTTCATCTTGCAGAGGAATCTGTTTCAATTCAAAGGTCAGCAGCAGCAAAAGCAGATGGATGGCTCCAATTCGATGAAGCAGAATTACAGCAAAGAAGAAACTTTATGGAAAGGAATGCCACGTGGCATATGATGCAGTTAACTTCTTCTTGTCCTACAGCTAGCATGTCCACCACAACCACAGTAACAACTAGACTTATGGACCCAAAACTCATCAAGACCCATGAACTCAACTTATTCATTTCACCTCACACATACAAAGAAAGAGAAAACGCTTTTATCCACTTAAATACTAGTAGTACTCATCAAAATGAATCTGATCAAACCCTTCAACTTTTCCCAATAAGGAATGGAGATCATGGATGCACTGATCATCATCATCATCATCATAACATTATCAAAGAGACACAGATATCAGCTTCAGCAATCAATGCACCCAACCAGTTTATTGAGTTTCTTCCCTTGAAAAACTGA
I am trying to count the number of occurrence of "ATG" substring in the above string (which is only one line without line breaks.) My file contains tens (10s) of these sequences and I want to be able to count how many "ATG" in each sequence. Each sequence is separated from others by an empty line.
I tried grep but did not know which options I should use (if at all grep can solve the problem) and I googled for any awk example but I did not find any.
Best Answer
Returns the number of occurrences of
ATG
in each line:This works for files with one or many lines.
Example 1
Consider this test file:
The code correctly counts the occurrences of ATG:
Example 2
Using the example in the current version of the question:
This results in:
How it works
awk implicitly loops through every line of a file. Each line is divided into fields.
-F'ATG'
This tells awk to use
ATG
as the field separator.NF{print NF-1}
For each non-empty line, this tells awk to print the number of fields minus 1.
(On empty lines, the number of fields,
NF
, is zero. So, the conditionNF
evaluates to false on these lines, effectively skipping over them.)