I want to find where a word appears in a text file — as in the number of words into the text that a word occurs — for all instances of that word, but I'm not sure even where to start. I imagine I'll need a loop, and some combination of grep and wc.
As an example, here is a an article about iPhone 11:
On Tuesday, in a sign that Apple is paying attention to consumers who aren’t racing to buy more expensive phones, the company said the iPhone 11, its entry-level phone, would start at $700, compared with $750 for the comparable model last year.
Apple kept the starting prices of its more advanced models, the iPhone 11 Pro and iPhone 11 Pro Max, at $1,000 and $1,100. The company unveiled the new phones at a 90-minute press event at its Silicon Valley campus.
There are 81 words in the text.
jaireaux@macbook:~$ wc -w temp.txt
81 temp.txt
The word 'iPhone' appears three times.
jaireaux@macbook:~$ grep -o -i iphone temp.txt | wc -w
3
The output I want would be like this:
jaireaux@macbook:~$ whereword iPhone temp.txt
24
54
57
What would I do to get that output?
Best Answer
Here's one way, using GNU tools:
The first
tr
replaces all spaces with newlines, and then the second deletes all punctuation (so thatiPhone,
can be found as a word). Thegrep .
ensures that we skip any blank lines (we don't want to count those) and thegrep -n
appends the line number to the output. Then, the-F
tellsgrep
not to treat its input as a regular expression, and the-x
that it should only find matches that span the entire line (so thatjob
will not count as a match forjobs
). Note that the numbers you gave in your question were off by one.If you only want the numbers, you could add another step:
As has been pointed out in the comments, this will still have problems with "words" such as
aren't
ordouble-barreled
. You can improve on that using: