You can't have two files with the same name in the same directory. Filenames are by definition unique keys.
What you have is almost certainly a special character. I know you checked for them, but how exactly? You could say something like ls *gff | hexdump -C
to find where the special characters are. Any byte with the high bit set (that is, hexadecimal values between 80
and FF
) will be an indication of something gone wrong. Anything below 20
(decimal 32) is also a special character. Another hint is the presence of dots .
in the right, text column of hexdump -C
.
There are numerous characters that look like US ASCII characters in UTF-8. Even in US ASCII, 1 and l can often look similar. Then, you have The C from Cyrillic (U+0421), the Greek Lunate Sigma (U+03F9, also exactly like a C), Cyrillic/Greek lower case ‘o’, etc. And those are just the visible ones. There are quite a few invisible Unicode characters that could be in there.
Explanation: why does the high bit signify something gone wrong? The filename ‘Clon1918K_PCC1.gff’ appears to be 100% 7-bit US ASCII. Putting it through hexdump -C
produces this:
00000000 43 6c 6f 6e 31 39 31 38 4b 5f 50 43 43 31 2e 67 |Clon1918K_PCC1.g|
00000010 66 66 |ff|
All of these byte values are below 0x80
(8th bit clear) because they are all 7-bit US ASCII codepoints. Unicode codepoints U+0000 to U+007F represent the traditional 7-bit US ASCII characters. Codepoints U+0080 and above represent other characters and are encoded as two to six bytes in UTF-8 (on Linux, try man utf8
for a lot of information on how this is done). By definition, UTF-8 encodes US-ASCII codepoints as themselves (i.e. hex ASCII character 41
, Unicode U+0041, is encoded as the single byte 41
). Codepoints ≥ 128 are encoded as two to six bytes, each of which have the eighth bit set. The presence of a non-ASCII character can easily be detected by this without having to decode the stream. For example, say I replace the third character in the filename, ‘o’ (ASCII 6f
, U+006F) with the Unicode character ‘U+03FB GREEK SMALL LETTER OMICRON’ which looks like this: ‘ο’. hexdump -C
then produces this:
00000000 43 6c ce bf 6e 31 39 31 38 4b 5f 50 43 43 31 2e |Cl..n1918K_PCC1.|
00000010 67 66 66 |gff|
The third character is now encoded as the UTF-8 sequence ce bf
, each byte of which has its 8th bit set. And this is your sign of trouble in this case. Also, note how hexdump
, which only decodes 7-bit ASCII, fails to decode the single UTF-8 character and shows two unprintable characters (..
) instead.
Best Answer
These are backup files that gedit creates by default. You can disable this feature by going to
Preferences → Editor
and unchecking the line
Create a backup copy of files before saving