How do I remove the breadcumbs and replace with the full text path in the Nautilus address bar by default?
Ubuntu – How to replace the breadcumbs and replace with the full text path
nautilus
Related Solutions
Update: Following command can be used:
xclip -out -selection clipboard -target STRING | iconv --from-code ISO-8859-15 --to-code UTF-8 | xclip -in -selection clipboard
For explanation read the full answer.
To completely understand the answer, you need to have an understanding of Unicode code points and unicode encoding.
Below are short definitions and explanations of the required terms, but I recommend you read about them from the sources mentioned at the end of the answer.
Unicode Code Space: A range of integers from 0 to 10FFFF16.
Unicode Code Points: Any value in the Unicode codespace. A code point corresponds to a character, though not all code points are assigned to encoded characters.
UTF-8: UTF-8 (UCS Transformation Format - 8-bit) is a variable-width encoding that can represent every character in the Unicode character set. UCS stands for Universal Character Set.
The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin-derived alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.
This indicates that the character
é
which is causing problems takes two bytes to encode in UTF-8. We will verify it using some commands.ISO/IEC 8859-15: 8-bit single-byte coded graphic character sets.
To test, I made a directory /home/green/Pictures/café/
.
After copying the location from nautilus
, the outputs of the commands were as follows:
Command #1:
$ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 67 72 65 65 6e 2f 50 69 63 74 |/home/green/Pict| 00000010 75 72 65 73 2f 63 61 66 e9 2f |ures/caf./| 0000001a
Note that the encoding of café
is 63 61 66 e9
, which is all right as the Unicode Code Point U+00E9 represents {LATIN SMALL LETTER E WITH ACUTE}
or é
.
Command #2:
$ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 67 72 65 65 6e 2f 50 69 63 74 |/home/green/Pict| 00000010 75 72 65 73 2f 63 61 66 c3 a9 2f |ures/caf../| 0000001b
In the above output, café
is encoded as 63 61 66 c3 a9
. It is all right too because the UTF-8 encoding of code point U+00E9 (corresponding to é
) is \xC3\xA9
(\x
is used to represent that the following characters are hexadecimal numbers).
\xC3
represents 1 byte and so does \xA9
. Thus, UTF-8 needs 2 bytes to represent é
.
After copying the same text from PosteRazor
the outputs of the commands were:
Command #1:
$ xclip -out -selection clipboard -target STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 67 72 65 65 6e 2f 50 69 63 74 |/home/green/Pict| 00000010 75 72 65 73 2f 63 61 66 c3 a9 2f |ures/caf../| 0000001b
Clearly, the Unicode Code Points are messed up. Now, we have two code points (c3
and a9
) where there should be only one (e9
).
Unsurprisingly, the two code points i.e. U+00C3
and U+00A9
stand for {LATIN CAPITAL LETTER A WITH TILDE}
AND {COPYRIGHT SIGN}
, which is what we saw in PosteRazor
.
Command #2:
$ xclip -out -selection clipboard -target UTF8_STRING | hexdump -C 00000000 2f 68 6f 6d 65 2f 67 72 65 65 6e 2f 50 69 63 74 |/home/green/Pict| 00000010 75 72 65 73 2f 63 61 66 c3 a9 2f |ures/caf../| 0000001b
The output for this command seems to have remained unchanged, but there is a subtle difference.
In the previous output \xc3\xa9
formed a single character whereas now \xc3
forms one character on its own and \xa9
forms another character (which are Ã
and ©
, respectively).
Now we know what is happening, but how is it happening? To simulate the same thing, we will use Python. I'm using Python 3.3.0 here.
>>> import unicodedata >>> a = u'/home/green/Pictures/café' >>> a '/home/green/Pictures/café' >>> a = a.encode('utf-8') >>> a b'/home/green/Pictures/caf\xc3\xa9' >>> a = a.decode('iso-8859-15') >>> a '/home/green/Pictures/café' >>> a = a.encode('utf-8') >>> a b'/home/green/Pictures/caf\xc3\x83\xc2\xa9'
You can see that if we first encode the string using UTF-8 and then decode using ISO-8859-15, then we get the same string which we get while using PosteRazor
.
Now, notice the following code. Here too, we have copied and pasted the location from nautilus:
>>> z = u'/home/green/Pictures/café' >>> z '/home/green/Pictures/café' >>> z = z.encode('iso-8859-15') >>> z b'/home/green/Pictures/caf\xe9' >>> z = z.decode('iso-8859-15') >>> z '/home/green/Pictures/café'
Had we encoded the string using ISO-8859-15 initially, we'd have gotten the perfect result.
Note that \xe9
is the encoding for é
in ISO-8859-15, which apparently needs one byte. This is the same as the Unicode code point U+00E9 which, when encoded in UTF-8, needs 2 bytes and is represented by \xc3\xa9
.
Now that we know what and how everything is going on, how do we correct it? Well, you can either convert the paths to the ISO-8859-15 character set or you can just use the GUI for selecting files.
Sources and further information:
- Unicode 6.2.0 PDF - Part 3.4: Character and encoding
- Unicode Glossary
- Wikipedia - UTF-8
- *Wikipedia - List of Unicode Characters
- UTF-8 Complete Character List
- Wikipedia - ISO/IEC 8859-15
- ISO 8859-15 Complete Character List
- StackOverflow - Answer to "php to rtf, é becomes é"
- *StackOverflow - Decoding double encoded utf8 in Python
Related Question
- Nautilus – How to Copy the Current Path from Nautilus
- Ubuntu – How to find a file’s path in the text encoding used by PosteRazor
- Nautilus – Copy File and Folder Path from Nautilus
- Ubuntu – How to copy a path out of a open/save dialog and paste it into Nautilus
- Ubuntu – Use path bar and address bar in Nautilus
Best Answer
Assuming you are using 11.10:
Install the dconf-tools package and then open
dconf-editor
:Navigate to org ➜ gnome ➜ nautilus ➜ preferences and check the always-use-location-entry checkbox: