Bash – How to find the most frequent word of each file in a directory

bashfilestext processing

I need to find the most frequent word of each file in a directory and print it like this :

12 my /home/test/file1.txt
5 you /home/test/file3.txt
7 hello /home/test/file4.txt

I tried:

for tmp in <path> 
    do
   tr -c '[:alnum:]' '[\n*]' < "$tmp" | sort | uniq -c | sort -nr | head  -1 
   done   

It doesn't work

Best Answer

I would use grep with -o to print only the matched string top extract the words:

$ for file in *; do 
    printf '%s : %s\n' "$(grep -Eo '[[:alnum:]]+' "$file" | sort | uniq -c | 
        sort -rn | head -n1)" "$file" 
done
      8 no : file1
     10 so : file2
     12 in : file3

Alternatively, if your grep doesn't support -o, you can use tr to replace all whitespace and punctuation characters with \n, filter through grep . to skip blank lines and then count:

$ for file in *; do 
    printf '%s : %s\n' "$(tr '[[:punct:]][[:space:]]' '\n' < "$file" | grep . | 
      sort | uniq -c | sort -rn | head -n1)" "$file" 
done
  8 no : file1
 10 so : file2
 12 in : file3
Related Question