Gedit Files – Handling Identical Copies of Gedit Files

directoryfilenamesfilesystemsgeditls

When ever I save a gedit file in a directory, two copies of it get saved with identical content. Ex: If the gedit file is named as file_name and saved in Home directory then when u ls the Home directory you get file_name and file_name~ in the list. When the file command is run against them I get ASCII text, with very long lines for both of them when I view their contents using the less command they seem to largely contain identical content. The ~ file is a copy of the file when it was saved for the penultimate time. Can someone please help me in understanding as to why such a file (the file with a trailing ~ mark in its name ) is created?

Best Answer

These are backup files that gedit creates by default. You can disable this feature by going to

Preferences → Editor

and unchecking the line

Create a backup copy of files before saving

Related Solutions

Strange case: Text file that exist and doesn’t exist

You can't have two files with the same name in the same directory. Filenames are by definition unique keys.

What you have is almost certainly a special character. I know you checked for them, but how exactly? You could say something like ls *gff | hexdump -C to find where the special characters are. Any byte with the high bit set (that is, hexadecimal values between 80 and FF) will be an indication of something gone wrong. Anything below 20 (decimal 32) is also a special character. Another hint is the presence of dots . in the right, text column of hexdump -C.

There are numerous characters that look like US ASCII characters in UTF-8. Even in US ASCII, 1 and l can often look similar. Then, you have The C from Cyrillic (U+0421), the Greek Lunate Sigma (U+03F9, also exactly like a C), Cyrillic/Greek lower case ‘o’, etc. And those are just the visible ones. There are quite a few invisible Unicode characters that could be in there.

Explanation: why does the high bit signify something gone wrong? The filename ‘Clon1918K_PCC1.gff’ appears to be 100% 7-bit US ASCII. Putting it through hexdump -C produces this:

00000000  43 6c 6f 6e 31 39 31 38  4b 5f 50 43 43 31 2e 67  |Clon1918K_PCC1.g|
00000010  66 66                                             |ff|

All of these byte values are below 0x80 (8th bit clear) because they are all 7-bit US ASCII codepoints. Unicode codepoints U+0000 to U+007F represent the traditional 7-bit US ASCII characters. Codepoints U+0080 and above represent other characters and are encoded as two to six bytes in UTF-8 (on Linux, try man utf8 for a lot of information on how this is done). By definition, UTF-8 encodes US-ASCII codepoints as themselves (i.e. hex ASCII character 41, Unicode U+0041, is encoded as the single byte 41). Codepoints ≥ 128 are encoded as two to six bytes, each of which have the eighth bit set. The presence of a non-ASCII character can easily be detected by this without having to decode the stream. For example, say I replace the third character in the filename, ‘o’ (ASCII 6f, U+006F) with the Unicode character ‘U+03FB GREEK SMALL LETTER OMICRON’ which looks like this: ‘ο’. hexdump -C then produces this:

00000000  43 6c ce bf 6e 31 39 31  38 4b 5f 50 43 43 31 2e  |Cl..n1918K_PCC1.|
00000010  67 66 66                                          |gff|

The third character is now encoded as the UTF-8 sequence ce bf, each byte of which has its 8th bit set. And this is your sign of trouble in this case. Also, note how hexdump, which only decodes 7-bit ASCII, fails to decode the single UTF-8 character and shows two unprintable characters (..) instead.

ACL, ls, “permission denied” and a lot of questionmarks

The /home/alice/ directory needs executable access for the user accessing it.

EDIT: BTW, the question marks are there to indicate that ls can't get the permissions on the file.

Best Answer

Related Solutions

Strange case: Text file that exist and doesn’t exist

ACL, ls, “permission denied” and a lot of questionmarks

Related Question