Bash – Sorting STDIN by length and number of nonblanks in a Bash Script

bashshell-scriptsort

I am working on learning Bash scripting but I am struggling with this problem. Given a bunch of lines from STDIN, sort them first by the length of the line in increasing order. Then, if there are any lines with the same number of characters, sort them by the number of nonblank characters contained in the lines (also in increasing order).

I've tried this a couple of different ways but I usually get caught up in some of the idiosyncrasies of Bash.

Here's what I've got so far:

#!/bin/bash

sorted=()
while IFS='' read -r line; do
    length=${#line}
    if [[ ${sorted[$length]} == "" ]] ; then
        sorted[$length]="$line"
    else
        #non unique length
        #sorted[$length]="${sorted[$length]}\n$line"
        IFS=$'\n' arr=("${sorted[$length]}")
        arr+=("$line")

        spaces=()

        for ((i=0 ; i < ${#arr[@]} ; ++i )) ; do
            spaces[$i]=$(echo "${arr[$i]}" | sed "s: : \n:g" | grep -c " ")
        done

        arr_sorted=()

        for ((i =0 ; i < ${#spaces[@]} ; i++ )) ; do
                for ((j=0 ; j < ${#arr[@]} ; i++ )) ; do

                        this_line_length=$(echo "${arr[$j]}" | sed "s: : \n:g" | grep -c " ")
                        if [[ "$this_line_length" == "${spaces[$i]}" ]] ; then
                            arr_sorted+=("${arr[$j]}")
                            unset arr[$j]
                        fi
                done
        done


    sorted[$length]="${arr_sorted[@]}"


    fi
done

I'm going to go ahead and guess this is nowhere near the best way to do it. I thought I would try to implement everything without relying too heavily on bash builtins but now it seems pretty pointless.

Best Answer

If you're allowed to use evil external contraptions such as sort and cut:

#! /bin/bash
while IFS= read -r line; do
    squeezed=$( tr -d '[:blank:]' <<<"$line" )
    printf '%d\t%d\t%s\n' ${#line} ${#squeezed} "$line"
done | sort -n -k 1 -k 2 | cut -f 3-

Edit: Since everybody's doing it, here's a solution with perl:

perl -e 'print sort { length $a <=> length $b || $a =~ y/ \t//c <=> $b =~ y/ \t//c } <>'
Related Question