Shell – Counting the characters of each line with wc

shelltext processingwc

Is it possible to use wc to count the chars of each line instead of the total amount of chars?

e.g.

echo -e foo\\nbar\\nbazz | grep -i ba

returns:

bar
bazz

So why doesn't echo -e foo\\nbar\\nbazz | grep ba | wc -m
return a list of the lengths of those words? (3 and 4)

Suggestions?

P.S.: why are linefeeds counted with wc -m ? wc -l counts the newlines, so why should wc -m count them too?

Best Answer

wc counts over the whole file; You can use awk to process line by line (not counting the line delimiter):

echo -e "foo\nbar\nbazz\n" | grep ba | awk '{print length}'

or as awk is mostly a superset of grep:

echo -e "foo\nbar\nbazz\n" | awk '/ba/ {print length}'

(note that some awk implementations report the number of bytes (like wc -c) as opposed to the number of characters (like wc -m) and others will count bytes that don't form part of valid characters in addition to the characters (while wc -m would ignore them in most implementations))

Related Solutions

Bash – Preserving Spaces in Array Values and Trimming Sort Command Results

When using shell variables, you can preserve space characters (more precisely, prevent values from being split into words based on the field separator characters, which are enumerated in the $IFS shell variable) by surrounding the shell variables with double quotes.

for w in "${WORDS[@]}" 
do 
  echo -n "$f [$w]:"
  grep -aci "$w" $f 2>/dev/null
done

(It wouldn't hurt to surround $f with quotes, too, in case you encounter filenames with spaces.)

As the second loop(2) ends up outputting every unique occurrences of a word in a log, how can its scope be restricted or how should I discard:

the output consisting of single chars?

Add grep .. in the pipeline to include only lines with 2 or more characters.

the output consisting of single occurrences?

Add -d to the uniq in the pipeline, so that it will only show duplicate lines.

cat $f 2>/dev/null | tr -c '[:alnum:]' '[\n*]' | tr -d '[:digit:]' | sort -f | grep .. | uniq -dci | sort -fnr

Are there any recommendations for visually presenting the output, or is there a tool which provides further functionality for either searching or formatting count and word data?

There are a bunch of applications out there that will scan and summarize interesting occurrences in log files, some free, some commercial. I'm not sure we're allowed to give broad recommendations, but if you can give examples of queries you'd like to make or output formats you'd like to see, maybe we can answer those types of questions.

Counting lines without text in a file

Your system should have GNU grep, that has an option -P to use Perl expressions and you can use that, combined with -c (so no need for wc -l):

grep -Pvc '\S' somefile

The '\S' hands the pattern \S to grep and matches all line containing anything that is not space, -v selects all the other lines (those only with space), and -c counts them.

From the man page for grep:

-P, --perl-regexp
       Interpret  PATTERN  as  a  Perl  regular  expression  (PCRE, see
       below).  This is highly experimental and grep  -P  may  warn  of
       unimplemented features.

-v, --invert-match
       Invert the sense of matching, to select non-matching lines.  (-v
       is specified by POSIX.)

-c, --count
       Suppress normal output; instead print a count of matching  lines
       for  each  input  file.  With the -v, --invert-match option (see
       below), count non-matching lines.  (-c is specified by POSIX.)

Best Answer

Related Solutions

Bash – Preserving Spaces in Array Values and Trimming Sort Command Results

Counting lines without text in a file

Related Question