Grep – Position by Word Count of All Repetitions of a Word in a Text File

grepwc

I want to find where a word appears in a text file — as in the number of words into the text that a word occurs — for all instances of that word, but I'm not sure even where to start. I imagine I'll need a loop, and some combination of grep and wc.

As an example, here is a an article about iPhone 11:

On Tuesday, in a sign that Apple is paying attention to consumers who aren’t racing to buy more expensive phones, the company said the iPhone 11, its entry-level phone, would start at $700, compared with $750 for the comparable model last year.

Apple kept the starting prices of its more advanced models, the iPhone 11 Pro and iPhone 11 Pro Max, at $1,000 and $1,100. The company unveiled the new phones at a 90-minute press event at its Silicon Valley campus.

There are 81 words in the text.

jaireaux@macbook:~$ wc -w temp.txt 
      81 temp.txt

The word 'iPhone' appears three times.

jaireaux@macbook:~$ grep -o -i iphone temp.txt | wc -w
       3

The output I want would be like this:

jaireaux@macbook:~$ whereword iPhone temp.txt 
      24
      54
      57

What would I do to get that output?

Best Answer

Here's one way, using GNU tools:

$ tr ' ' '\n' < file | tr -d '[:punct:]' | grep . | grep -nFx iPhone
25:iPhone
54:iPhone
58:iPhone

The first tr replaces all spaces with newlines, and then the second deletes all punctuation (so that iPhone, can be found as a word). The grep . ensures that we skip any blank lines (we don't want to count those) and the grep -n appends the line number to the output. Then, the -F tells grep not to treat its input as a regular expression, and the -x that it should only find matches that span the entire line (so that job will not count as a match for jobs). Note that the numbers you gave in your question were off by one.

If you only want the numbers, you could add another step:

$ tr ' ' '\n' < file | tr -d '[:punct:]' | grep . | grep -nFx iPhone | cut -d: -f1
25
54
58

As has been pointed out in the comments, this will still have problems with "words" such as aren't or double-barreled. You can improve on that using:

tr '[[:space:][:punct:]]' '\n' < file | grep . | grep -nFx iPhone
Related Question