How to sort file by character occurrences per line

sorttext processing

I'm quite new to Linux, and I've found quite a bit of useful information on how to do character counts in a file, but is there a way in Linux/terminal to sort a text file by the number of times a specific character occurs per line?

E.g. given:

baseball
aardvark
a man a plan a canal panama
cat
bat
bill

Sort by the number of occurrences of the letter "a" yielding:

a man a plan a canal panama
aardvark
baseball
cat
bat
bill

Regarding "cat" and "bat" at one occurrence of "a" each, I don't care if the order of lines with equal counts get reversed, just interested in a general sort of lines by character frequency.

Best Answer

The general approach with this kind of task is to use awk or perl... to compute the metric you're interested in and prepend it to the line, and then feed that to sort and remove the metric off the sorted output:

awk '{print gsub("a","a"), $0}' < file | sort -rn | cut -d' ' -f2-

Method #1: using head & tail

$ (head -n 2 sample.txt; tail -n +3 sample.txt | sort -t' ' -nk2) > a.tmp && mv a.tmp sample.txt

Nome     Note
------------
Mehdi    0
Shnou    5
Others   10
Sunday   20

This takes the first line of the text file, then tails everything after the first 2 lines which is then sorted.

Method #2: just using head

$ (head -n 2; sort -t' ' -nk2) < sample.txt > a.tmp && mv a.tmp sample.txt

Nome     Note
------------
Mehdi    0
Shnou    5
Others   10
Sunday   20

Takes the text file as input, displays just the first line, sort the rest.

It's typically not a good idea to edit files in place. It's possible, but better to use an intermediate file.

Method #3: Doing #2 without an intermediate file

Stealing the idea from @StephaneChazelas you could do the following using the "1<>" notation to open a file for reading & writing, and the improvements he suggested with the sort command.

$ (head -n 2; sort -nk2) < sample.txt 1<> sample.txt

Nome     Note
------------
Mehdi    0
Shnou    5
Others   10
Sunday   20

Sort lines by number of words per line

You could do something like:

awk '{print NF,$0}' file | sort -nr | cut -d' ' -f 2-

We use awk to prefix the number of fields to each line. We then sort by that number and remove it with cut.

Best Answer

Related Solutions

Sort part of a file

Method #1: using head & tail

Method #2: just using head

Method #3: Doing #2 without an intermediate file

Sort lines by number of words per line

Related Question