The question indicates my preference to use emacs, but the overriding issue is that I want to be able to do a normal text search and somehow see/copy-paste the byte-offset of the matched text.
To be clear, by byte-offset, I do not mean emac's point value, which shows the number of characters from the start of the buffer, eg. in UTF-16LE, point considers \x0d\x00\x0a\x00
as 1 character, whereas I'm interested in it as 4 bytes.
Any other editor (or viewer) which presents this basic information while displaying the text in a "normally" readable and searchable fashion is worthwile.
Even a hex view with a synchronized normal-text view would be okay, but a typical Hex-dump viewer/editor is not what I'm after, as they (typically) only display ASCII chars, and I haven't found a FOSS Hex-dump viewer/editor which can perform a simple text-mode search for non ASCII UTF-8 or for any UTF-16 strings.
I'm primarily concerned with legibility and search-ability of the text, so a "normal" Hex dump program is only a fallback (which I'm already using).
Best Answer
First of all, in case you don't know about it, Emacs has
hexl-find-file
which opens up a file in hex editing mode. I know that it's not what you asked for, but if you're already using one, and you're comfortable with Emacs, then it's good to know about it for future needs.Second, for this kind of "raw" editing of a file (which I tend to do often),
find-file-literally
is really great. It does what you'd expect it to do, and pretends to be a pre-unicode version of itself and open the file with escapes showing up for non-ascii characters (and control chars etc). This is likely to do what you want, though it does have the obvious disadvantage of not being able to actually read the text if you have a lot of non-ascii content.So going down further into primitive support, there's the
enable-multibyte-characters
variable and theset-buffer-multibyte
function that is used to toggle it. The nice thing about this is that it changes the buffer presentation dynamically -- for example, try this:and you now have a key that toggles the raw mode dynamically. It also has the nice property of leaving the cursor in the same place. But this raw mode shows you the internal representation (which looks like UTF-8) and not whatever the file happens to be using as its encoding. It should be possible to do what you're talking about with some hack (for example, using
find-file-literally
on an open file will ask you about revisiting it, but that resets the location and reloads the file too) -- but it sounds like the above is already fine. (That is, my guess is that you're trying to edit some text field in an otherwise binary file...)