Way to count the number of lines of text in a file including non-delimited ones

linenewlineswc

The POSIX wc command counts how many POSIX lines in a file. The POSIX standard defines a line as a text string with the suffix \n. Without \n, a pure text string can't be called a line.

But to me, it's more natural to count how many lines of text string in a file. Is there an easy way to do that?

root:[~]# printf "aa\nbb" | wc -l
1
root:[~]# printf "aa\nbb\n" | wc -l
2
root:[~]#

Best Answer

With GNU sed, you can use:

sed '$=;d'

As GNU sed does consider those extra characters after the last newline as an extra line. GNU sed like most GNU utilities also supports NUL characters in its input and doesn't have a limitation on the length of lines (the two other criteria that make an input non-text as per POSIX).

POSIXLy, building-up on @Inian's answer to support too-long lines and NUL bytes:

LC_ALL=C tr -cs '\n' '[x*]' | awk 'END {print NR}'

That tr command translates all sequences of one or more character (each byte interpreted as a character in the C locale to avoid decoding issues) other than newline to one x character, so awk input records will be either 0 or 1 byte long and its input contain only x and newline characters.

$ printf '%10000s\na\0b\nc\nd' | wc -l
3

$ printf '%10000s\na\0b\nc\nd' | mawk 'END{print NR}'
2
$ printf '%10000s\na\0b\nc\nd' | busybox awk 'END{print NR}'
5
$ printf '%10000s\na\0b\nc\nd' | gawk 'END{print NR}'
4

$ printf '%10000s\na\0b\nc\nd' | LC_ALL=C tr -cs '\n' '[x*]' | mawk 'END{print NR}'
4

Related Solutions

The simplest method to count lines matching specific patterns, including ‘0’ if line is not found

how about feeding the pattern file back in as a data file so that each pattern finds at least one match, and then subtracting one from the final reported count for each match

grep -f patterns.in logfile.txt patterns.in | cut -f2 -d':' | sort | uniq -c | awk '{print($1 - 1" "$2)}'

Bash – Count lines of non-terminating input

Typing Ctrl+C from the terminal sends SIGINT to the foreground process group. If you want wc to survive this event and produce output, you need to have it ignore the signal.

The solution is to run wc in a subshell and have its parent shell set SIGINT to be ignored before running wc. wc will inherit this setting and not die when SIGINT is sent to the process group. The rest of the pipeline will die, leaving wc reading from a pipe that has no process on the other end. This will cause wc to see EOF on the pipe and it will then output its counts and exit.

ngrep -W byline port 80 and dst host 1.2.3.4 | grep ":80" | (trap '' INT ; wc)

Best Answer

Related Solutions

The simplest method to count lines matching specific patterns, including ‘0’ if line is not found

Bash – Count lines of non-terminating input

Related Question