Debian Keyboard – Why the US International Keyboard Layout on Debian Is Different

debiankeyboard

Edit: I realized that the "problem" was not only in Ubuntu but in Debian itself and Ubuntu just inherited it, so I had this migrated from Ask Ubuntu


I have been using Linux on and off for 10 years, and more recently I have spent more time with OSX.

But, I still remember that in the beginning I'd choose the US international keyboard layout and it would have exactly the same output as the Windows keyboard layout (and most recently, the OSX US international layout).

However, a few years ago when I installed Ubuntu, I noticed that the cedilla wasn't printed anymore (ç or Ç). This is a combination of the following keys: ' + c. Instead, what I get is the ć letter.

When did it start to happen, and why the difference to the behavior on the other OSes? What puzzles me even more is that there is even an "US International alternative" keyboard layout, which prints exactly the same keys! So, what's it alternative to?

This has been reported as a bug back to Canonical (can't find the link now), but the keyboard layout has never changed back to what I'd expect. I know the workarounds to fix it to what I need, but I just would like to know why/when it has become different.

Best Answer

Summary

  1. If you are using Ubuntu, it probably changed around 2005, when the default character set changed from ISO 8859-1 to UTF-8.
  2. US Alternative International adds some dead keys.

The dead key settings depend on your locale and character set.

For example:

  • en_US.UTF-8 is defined in /usr/share/X11/locale/en_US.UTF-8/Compose
  • ISO 8859-1 is defined in /usr/share/X11/locale/iso8859-1/Compose

If you look in them using grep, you can see there is a difference:

$ grep '<dead_acute> <c>' /usr/share/X11/locale/en_US.UTF-8/Compose 
<dead_acute> <c>                    : "ć"   U0107 # LATIN SMALL LETTER C WITH ACUTE

$ grep '<dead_acute> <c>' /usr/share/X11/locale/iso8859-1/Compose
<dead_acute> <c>            : "\347"    ccedilla

Namely:

  • Latin1 encoding: ', c = ç
  • UTF-8 encoding: ', c = ć

The git logs ((en_US.UTF-8) (iso8859-1)) show it has been this way since at least 2004.


The difference between US International and US Alternative International is defined in /usr/share/X11/xkb/symbols/us.

Namely, the US Alternative International layout adds these extra AltGr dead keys:

  • dead_macron: on AltGr-minus
  • dead_breve: on AltGr-parenleft
  • dead_abovedot: on AltGr-period
  • dead_abovering: on AltGr-0
  • dead_doubleacute: on AltGr-equal (as quotedbl is already used)
  • dead_caron: on AltGr-less (AltGr-shift-comma)
  • dead_cedilla: on AltGr-comma
  • dead_ogonek: on AltGr-semicolon
  • dead_belowdot: on AltGr-underscore (AltGr-shift-minus)
  • dead_hook: on AltGr-question
  • dead_horn: on AltGr-plus (AltGr-shift-equal)
  • dead_diaeresis: on AltGr-colon (Alt-shift-semicolon)

For example:

  • US International: AltGr+- = ¥
  • US Alternative International: AltGr+-, a = ā

UTF-8 became the default encoding: