Way to count the number of lines of text in a file including non-delimited ones

linenewlineswc

The POSIX wc command counts how many POSIX lines in a file. The POSIX standard defines a line as a text string with the suffix \n. Without \n, a pure text string can't be called a line.

But to me, it's more natural to count how many lines of text string in a file. Is there an easy way to do that?

root:[~]# printf "aa\nbb" | wc -l
1
root:[~]# printf "aa\nbb\n" | wc -l
2
root:[~]#

Best Answer

With GNU sed, you can use:

sed '$=;d'

As GNU sed does consider those extra characters after the last newline as an extra line. GNU sed like most GNU utilities also supports NUL characters in its input and doesn't have a limitation on the length of lines (the two other criteria that make an input non-text as per POSIX).

POSIXLy, building-up on @Inian's answer to support too-long lines and NUL bytes:

LC_ALL=C tr -cs '\n' '[x*]' | awk 'END {print NR}'

That tr command translates all sequences of one or more character (each byte interpreted as a character in the C locale to avoid decoding issues) other than newline to one x character, so awk input records will be either 0 or 1 byte long and its input contain only x and newline characters.

$ printf '%10000s\na\0b\nc\nd' | wc -l
3

$ printf '%10000s\na\0b\nc\nd' | mawk 'END{print NR}'
2
$ printf '%10000s\na\0b\nc\nd' | busybox awk 'END{print NR}'
5
$ printf '%10000s\na\0b\nc\nd' | gawk 'END{print NR}'
4

$ printf '%10000s\na\0b\nc\nd' | LC_ALL=C tr -cs '\n' '[x*]' | mawk 'END{print NR}'
4
Related Question