Get line number from byte offset

text processingUtilities

Having byte offset for a file.

Is there a tool that gives line number for this byte?

  • Byte count starting with zero, as in: first byte is 0 not 1.
  • Line number starting with 1.
  • File can have both plain text, "binary" blobs, multibyte characters etc. But the section I am interested in: End of file, has only ASCII.

Example, file:

001
002
003  <<-- first zero on this line is byte 8
004

Having byte offset 8 that would give me line 3.

Guess I could use something like this to find line number:

 a. tail -c+(offset + 1) file | wc -l, here +1 as tail counts from 1.
 b. wc -l file
 c. Then tail -n+num where num is a - b + 1

But … is there a, fairly common, tool that can give me num directly?


Edit, err: or the more obvious:

head -c+offset file | wc -l

Best Answer

In your example,

001
002
003
004

byte number 8 is the second newline, not the 0 on the next line.

The following will give you the number of full lines after $b bytes:

$ dd if=data.in bs=1 count="$b" | wc -l

It will report 2 with b set to 8 and it will report 1 with b set to 7.

The dd utility, the way it's used here, will read from the file data.in, and will read $b blocks of size 1 byte.

As "icarus" rightly points out in the comments below, using bs=1 is inefficient. It's more efficient, in this particular case, to swap bs and count:

$ dd if=data.in bs="$b" count=1 | wc -l

This will have the same effect as the first dd command, but will read only one block of $b bytes.

The wc utility counts newlines, and a "line" in Unix is always terminated by a newline. So the above command will still say 2 if you set b to anything lower than 12 (the following newline). The result you are looking for is therefore whatever number the above pipeline reports, plus 1.

This will obviously also count the random newlines in the binary blob part of your file that precedes the ASCII text. If you knew where the ASCII bit starts, you could add skip="$offset" to the dd command, where $offset is the number of bytes to skip into the file.

Related Question