PosteRazor uses an apparently outdated GUI that is incapable of properly displaying my filenames:
For the sake of convenience, I want to be able to open any file in PosteRazor by copying and pasting its path from Nautilus. This works in other applications, but sadly, PosteRazor is unable to understand the path:
How can I convert the path that Nautilus generates into a text encoding that is compatible with PosteRazor?
The Ubuntu package for PosteRazor lists a dependency on the Fast Light Toolkit (FLTK). Its programmer's documentation on Unicode looks like it might contain the necessary information to answer my question, but I'm not sure how to interpret it.
Details
-
Some sample content:
-
A path as it natively appears in Nautilus:
/home/ak/café/north-america.jpg
-
The same path as it natively appears in PosteRazor:
-
The clipboard contents after copying the path from Nautilus:
$ xclip -out -selection clipboard -target TARGETS TIMESTAMP TARGETS MULTIPLE x-special/gnome-copied-files text/uri-list UTF8_STRING COMPOUND_TEXT TEXT STRING text/plain;charset=utf-8 text/plain $ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 e9 2f 6e 6f |/home/ak/caf./no| 00000010 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |rth-america.jpg| 0000001f $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |orth-america.jpg| 00000020 $ xclip -out -selection clipboard -target text/plain | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 5c 75 30 30 |/home/ak/caf\u00| 00000010 65 39 2f 6e 6f 72 74 68 2d 61 6d 65 72 69 63 61 |e9/north-america| 00000020 2e 6a 70 67 |.jpg| 00000024 $ xclip -out -selection clipboard -target 'text/plain;charset=utf-8' | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |orth-america.jpg| 00000020
-
The clipboard contents after copying the path from PosteRazor:
$ xclip -out -selection clipboard -target TARGETS STRING $ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |orth-america.jpg| 00000020
-
PosteRazor after copying the path from Nautilus and pasting it into PosteRazor:
-
PosteRazor after copying the path from PosteRazor and pasting it into PosteRazor:
-
The path copied from PosteRazor and pasted into Chromium:
/home/ak/café/norrth-america.jpg
-
The path copied from PosteRazor and pasted into Chromium and then copied from Chromium and pasted back into PosteRazor:
-
The clipboard contents after copying that from Chromium:
$ xclip -out -selection clipboard -target TARGETS TIMESTAMP TARGETS MULTIPLE SAVE_TARGETS COMPOUND_TEXT STRING TEXT UTF8_STRING text/plain $ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 |orrth-america.jp| 00000020 67 |g| 00000021 $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 |orrth-america.jp| 00000020 67 |g| 00000021 $ xclip -out -selection clipboard -target text/plain | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 |orrth-america.jp| 00000020 67 |g| 00000021
-
The path copied from PosteRazor and pasted into GNOME Terminal:
-
The path copied from PosteRazor and pasted into GNOME Terminal and then copied from GNOME Terminal and pasted back into PosteRazor:
-
The clipboard contents after copying that from GNOME Terminal:
$ xclip -out -selection clipboard -target TARGETS TIMESTAMP TARGETS MULTIPLE SAVE_TARGETS UTF8_STRING COMPOUND_TEXT TEXT STRING text/plain;charset=utf-8 text/plain $ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 e9 2f 6e 6f |/home/ak/caf./no| 00000010 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |rth-america.jpg| 0000001f $ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |orth-america.jpg| 00000020 $ xclip -out -selection clipboard -target 'text/plain' | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 5c 75 30 30 |/home/ak/caf\u00| 00000010 65 39 2f 6e 6f 72 74 68 2d 61 6d 65 72 69 63 61 |e9/north-america| 00000020 2e 6a 70 67 |.jpg| 00000024 $ xclip -out -selection clipboard -target 'text/plain;charset=utf-8' | hexdump -C 00000000 2f 68 6f 6d 65 2f 61 6b 2f 63 61 66 c3 a9 2f 6e |/home/ak/caf../n| 00000010 6f 72 74 68 2d 61 6d 65 72 69 63 61 2e 6a 70 67 |orth-america.jpg| 00000020
-
Best Answer
Update: Following command can be used:
For explanation read the full answer.
To completely understand the answer, you need to have an understanding of Unicode code points and unicode encoding.
Below are short definitions and explanations of the required terms, but I recommend you read about them from the sources mentioned at the end of the answer.
Unicode Code Space: A range of integers from 0 to 10FFFF16.
Unicode Code Points: Any value in the Unicode codespace. A code point corresponds to a character, though not all code points are assigned to encoded characters.
UTF-8: UTF-8 (UCS Transformation Format - 8-bit) is a variable-width encoding that can represent every character in the Unicode character set. UCS stands for Universal Character Set.
The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin-derived alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.
This indicates that the character
é
which is causing problems takes two bytes to encode in UTF-8. We will verify it using some commands.ISO/IEC 8859-15: 8-bit single-byte coded graphic character sets.
To test, I made a directory
/home/green/Pictures/café/
.After copying the location from
nautilus
, the outputs of the commands were as follows:Command #1:
Note that the encoding of
café
is63 61 66 e9
, which is all right as the Unicode Code Point U+00E9 represents{LATIN SMALL LETTER E WITH ACUTE}
oré
.Command #2:
In the above output,
café
is encoded as63 61 66 c3 a9
. It is all right too because the UTF-8 encoding of code point U+00E9 (corresponding toé
) is\xC3\xA9
(\x
is used to represent that the following characters are hexadecimal numbers).\xC3
represents 1 byte and so does\xA9
. Thus, UTF-8 needs 2 bytes to representé
.After copying the same text from
PosteRazor
the outputs of the commands were:Command #1:
Clearly, the Unicode Code Points are messed up. Now, we have two code points (
c3
anda9
) where there should be only one (e9
).Unsurprisingly, the two code points i.e.
U+00C3
andU+00A9
stand for{LATIN CAPITAL LETTER A WITH TILDE}
AND{COPYRIGHT SIGN}
, which is what we saw inPosteRazor
.Command #2:
The output for this command seems to have remained unchanged, but there is a subtle difference.
In the previous output
\xc3\xa9
formed a single character whereas now\xc3
forms one character on its own and\xa9
forms another character (which areÃ
and©
, respectively).Now we know what is happening, but how is it happening? To simulate the same thing, we will use Python. I'm using Python 3.3.0 here.
You can see that if we first encode the string using UTF-8 and then decode using ISO-8859-15, then we get the same string which we get while using
PosteRazor
.Now, notice the following code. Here too, we have copied and pasted the location from nautilus:
Had we encoded the string using ISO-8859-15 initially, we'd have gotten the perfect result.
Note that
\xe9
is the encoding foré
in ISO-8859-15, which apparently needs one byte. This is the same as the Unicode code point U+00E9 which, when encoded in UTF-8, needs 2 bytes and is represented by\xc3\xa9
.Now that we know what and how everything is going on, how do we correct it? Well, you can either convert the paths to the ISO-8859-15 character set or you can just use the GUI for selecting files.
Sources and further information: