In emacs (or other editor) how to display the byte offset of the cursor

editorsemacs

The question indicates my preference to use emacs, but the overriding issue is that I want to be able to do a normal text search and somehow see/copy-paste the byte-offset of the matched text.

To be clear, by byte-offset, I do not mean emac's point value, which shows the number of characters from the start of the buffer, eg. in UTF-16LE, point considers \x0d\x00\x0a\x00 as 1 character, whereas I'm interested in it as 4 bytes.

Any other editor (or viewer) which presents this basic information while displaying the text in a "normally" readable and searchable fashion is worthwile.

Even a hex view with a synchronized normal-text view would be okay, but a typical Hex-dump viewer/editor is not what I'm after, as they (typically) only display ASCII chars, and I haven't found a FOSS Hex-dump viewer/editor which can perform a simple text-mode search for non ASCII UTF-8 or for any UTF-16 strings.

I'm primarily concerned with legibility and search-ability of the text, so a "normal" Hex dump program is only a fallback (which I'm already using).

Best Answer

First of all, in case you don't know about it, Emacs has hexl-find-file which opens up a file in hex editing mode. I know that it's not what you asked for, but if you're already using one, and you're comfortable with Emacs, then it's good to know about it for future needs.

Second, for this kind of "raw" editing of a file (which I tend to do often), find-file-literally is really great. It does what you'd expect it to do, and pretends to be a pre-unicode version of itself and open the file with escapes showing up for non-ascii characters (and control chars etc). This is likely to do what you want, though it does have the obvious disadvantage of not being able to actually read the text if you have a lot of non-ascii content.

So going down further into primitive support, there's the enable-multibyte-characters variable and the set-buffer-multibyte function that is used to toggle it. The nice thing about this is that it changes the buffer presentation dynamically -- for example, try this:

(defun my-multi-toggle ()
  (interactive)
  (set-buffer-multibyte (not enable-multibyte-characters)))
(global-set-key (kbd "C-~") 'my-multi-toggle)

and you now have a key that toggles the raw mode dynamically. It also has the nice property of leaving the cursor in the same place. But this raw mode shows you the internal representation (which looks like UTF-8) and not whatever the file happens to be using as its encoding. It should be possible to do what you're talking about with some hack (for example, using find-file-literally on an open file will ask you about revisiting it, but that resets the location and reloads the file too) -- but it sounds like the above is already fine. (That is, my guess is that you're trying to edit some text field in an otherwise binary file...)

Related Question