Having byte offset for a file.
Is there a tool that gives line number for this byte?
- Byte count starting with zero, as in: first byte is 0 not 1.
- Line number starting with 1.
- File can have both plain text, "binary" blobs, multibyte characters etc. But the section I am interested in: End of file, has only ASCII.
Example, file:
001
002
003 <<-- first zero on this line is byte 8
004
Having byte offset 8
that would give me line 3
.
Guess I could use something like this to find line number:
a. tail -c+(offset + 1) file | wc -l
, here +1
as tail
counts from 1.
b. wc -l file
c. Then tail -n+num
where num
is a - b + 1
But … is there a, fairly common, tool that can give me num
directly?
Edit, err: or the more obvious:
head -c+offset file | wc -l
Best Answer
In your example,
byte number 8 is the second newline, not the
0
on the next line.The following will give you the number of full lines after
$b
bytes:It will report
2
withb
set to 8 and it will report1
withb
set to 7.The
dd
utility, the way it's used here, will read from the filedata.in
, and will read$b
blocks of size 1 byte.As "icarus" rightly points out in the comments below, using
bs=1
is inefficient. It's more efficient, in this particular case, to swapbs
andcount
:This will have the same effect as the first
dd
command, but will read only one block of$b
bytes.The
wc
utility counts newlines, and a "line" in Unix is always terminated by a newline. So the above command will still say2
if you setb
to anything lower than 12 (the following newline). The result you are looking for is therefore whatever number the above pipeline reports, plus 1.This will obviously also count the random newlines in the binary blob part of your file that precedes the ASCII text. If you knew where the ASCII bit starts, you could add
skip="$offset"
to thedd
command, where$offset
is the number of bytes to skip into the file.