How to Find Line with Least Characters in Shell

shelltext processingwc

I am writing a shell script, using any general UNIX commands. I have to retrieve the line that has the least characters (whitespace included). There can be up to around 20 lines.

I know I can use head -$L | tail -1 | wc -m to find the character count of line L. The problem is, the only method I can think of, using that, would be to manually write a mess of if statements, comparing the values.

Example data:

seven/7
4for
8 eight?
five!

Would return 4for since that line had the least characters.

In my case, if multiple lines have the shortest length, a single one should be returned. It does not matter which one is selected, as long as it is of the minimum length. But I don't see the harm in showing both ways for other users with other situations.

Best Answer

A Perl way. Note that if there are many lines of the same, shortest length, this approach will only print one of them:

perl -lne '$m//=$_; $m=$_ if length()<length($m); END{print $m if $.}' file

Explanation

perl -lne : -n means "read the input file line by line", -l causes trailing newlines to be removed from each input line and a newline to be added to each print call; and -e is the script that will be applied to each line.
$m//=$_ : set $m to the current line ($_) unless $m is defined. The //= operator is available since Perl 5.10.0.
$m=$_ if length()<length($m) : if the length of the current value of $m is greater than the length of the current line, save the current line ($_) as $m.
END{print $m if $.} : once all lines have been processed, print the current value of $m, the shortest line. The if $. ensures that this only happens when the line number ($.) is defined, avoiding printing an empty line for blank input.

Alternatively, since your file is small enough to fit in memory, you can do:

perl -e '@K=sort{length($a) <=> length($b)}<>; print "$K[0]"' file

Explanation

@K=sort{length($a) <=> length($b)}<> : <> here is an array whose elements are the lines of the file. The sort will sort them according to their length and the sorted lines are saved as array @K.
print "$K[0]" : print the first element of array @K: the shortest line.

If you want to print all shortest lines, you can use

perl -e '@K=sort{length($a) <=> length($b)}<>; 
         print grep {length($_)==length($K[0])}@K; ' file

Related Solutions

Bash – List files with line count and date

Here is something with find + wc + date.

find . -maxdepth 1 -exec sh -c '[ -f "$0" ] && \
  printf "%6s\t\t%s\t%s\n" "$(wc -l<"$0")" "$(date -r "$0")" "$0"' {} \;

Instead of date -r one can also use for example stat -c%y.

The output looks like this:

   394      Thu Oct 16 22:38:14 UTC 2014    ./.zshrc
     7      Thu Oct 30 11:19:01 UTC 2014    ./tmp.txt
     2      Thu Oct 30 06:02:00 UTC 2014    ./tmp2.txt
    40      Thu Oct 30 04:16:30 UTC 2014    ./pp.txt

Using this as starting point one can create a function which accepts directory and pattern as parameters:

myls () { find "$1" -maxdepth 1 -name "$2" -exec sh -c '[ -f "$0" ] && \
  printf "%6s \t\t%s\t%s\n" "$(wc -l<"$0")" "$(date -r "$0")" "$0"' {} \;; }

After that myls /tmp '*.png' will list only images from /tmp (notice single quotes around pattern to prevent shell from expanding a glob operator *).

Shell – Extra space with counted line number

As POSIX defined, the output of wc shall contain an entry for each input file of the form:

"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>

But the output file format pseudo printf() string differs from the System V version of wc:

"%7d%7d%7d %s\n"

POSIX didn't require leading spaces to be added, so it's free for implementation to do what it want. There are different implementations of wc, at least with OSX and wc from heirloom tools chest, it added leading spaces to output.

$ /usr/5bin/wc -l /tmp/file
      3  /tmp/file

GNU wc also add leading spaces when reading from standard in and without any options:

$ cat file | wc
  5       5      65

To remove all leading spaces, in POSIX shell:

set -f
set -- $nl
nl=$1
set +f

Note that this approach assume that variable only contain leading or trailing spaces, no spaces in the middle, like a b.

Best Answer

Explanation

Explanation

Related Solutions

Bash – List files with line count and date

Shell – Extra space with counted line number

Related Question