Dunno. Which is better: a saw or a hammer? :-)
Unicode isn't UTF
There's a bit in the article that's a bit more relevant to the subject at hand though:
- UTF-8 focuses on minimizing the byte size for representation of characters from the ASCII set (variable length representation: each character is represented on 1 to 4 bytes, and ASCII characters all fit on 1 byte). As Joel puts it:
“Look at all those zeros!” they said, since they were Americans and they were looking at English text which rarely used code points above U+00FF. Also they were liberal hippies in California who wanted to conserve (sneer). If they were Texans they wouldn’t have minded guzzling twice the number of bytes. But those Californian wimps couldn’t bear the idea of doubling the amount of storage it took for strings
UTF-32 focuses on exhaustiveness and fixed-length representation, using 4 bytes for all characters. It’s the most straightforward translation, mapping directly the Unicode code-point to 4 bytes. Obviously, it’s not very size-efficient.
UTF-16 is a compromise, using 2 bytes most of the time, but expanding to 2 * 2 bytes per character to represent certain characters, those not included in the Basic Multilingual Plane (BMP).
Also see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Problem is that coding Latin-2 (iso-8859-2) and Windows-1250 (used by windows) differ in some characters:
ž, š, ť, Ž, Š, Ť
All differences are summarized at Wikipedia or Czech version
If you set encoding=cp1250
, then it'll be ok.
I don't want to prolong comments so I'm adding that here.
There is a problem that standard code page uses only 1byte
(hex 100) for characters, so there are ISO standards for different languages.
If you have set encoding iso-8859-2
and trying to add unicode character (hex 160) Š
, than gvim loops over to character (hex 60). You have to use codes ISO-8859-2, where Š
ìs (hex 089). Other codes here: http://cs.wikipedia.org/wiki/ISO_8859-2
UTF-8 on the other hand uses 2bytes
and contains simultaineously all? letters and signs. So if you use set encoding=utf-8
and then add U0160
or U5927
you'll get Š
resp. 大
.
Fixedsys
contains ů and Ů, OR there is a difference in font versions between Windows language mutations (I use Czech version), but I doubt that. You can use windows utility Charmap.exe
, there you can select desired font and check which characters it supports, even their unicode code.
I was trying briefly some of default fonts in GVim and there seems to be some that supports Chinese (ie MS Mincho
), but I don't which signs are important.
GVim seems to be supporting only monospace
character fonts so, if you'll be searching for another font be aware of that. :)
Best Answer
Using gVim on Windows, I did the following two things:
The second command brings up a font picker. By choosing the font "@MS Mincho", I got some of the Japanese characters to display, but oddly they were rotated 90 degrees to the left.
Anyway, you'll have to set the encoding before loading or pasting text into gVim (otherwise it might just convert them to all question marks). Then you'll have to find a font that is (a) fixed width, and (b) includes the characters you want to see. I don't seem to have such a font on my system at the moment, but you may.