Greek characters in mutt’s sidebar (mailboxes names sourced via offlineimap)

character encodingimapmutt

Mutt's sidebar, does not read properly Greek characters (mailboxes named by using Greek characters). Anyhow, there is no such problem inside both the index and the pager, where Greek characters/words/names appear just fine.

                        enter image description here

Update after Gilles' comment

The setup in question, used from two different systems (a workstation and a laptop, both sporting Funtoo and GNU bash, version 4.2.45(1)-release), includes mutt for reading/writing e-mails from/to an IMAP server. The response to locale is

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=POSIX
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

and no locale variable is set in any of mutt's configuration files.

Messages are actually synced via offlineimap and sent using postfix. offlineimap, records the list of mailboxes (names) in a file (i.e. a file named mailboxes) which is sourced in one of mutt's configuration files (by instructing source ~/.mutt/mailboxes). Looking at the content of the mailboxes file, reveals that the Greek names are already "misinterpreted".

In any case, the Greek names appear fine via a webmail client (RoundCube), which accesses too the same IMAP server and, thus, the suspect mailboxes.

Questions

  • Why does this happen?
  • Is it an offlineimap misconfiguration issue?
  • How can it be resolved?

Remaining issue? [March, 2015] (see also answer below)

However, the local repository folder names (directory names) are still in an unreadable state. I.e., the above shown Greek folder name (Υποτροφία), is actually, the directory &A6UDwAO,A8QDwQO,A8YDrwOx-. Does this mean, that folder name conversions take place after, and not during, the syncing of folder names and messages? Or, is it required to remove these directories (from the local repository) and force another syncing via offlineimap (taking care not to let the respective mailbox folders be removed from the remote repository)?

Best Answer

In short,

the "problem" arises from the fact that, IMAP4 encodes folder names using a modified UTF-7 coding. offlineimap does not convert folder names to something readable before creating local repositories (e.g. in UTF-8). This, in turn, derives unreadable folder names like the ones shown in the screenshot of this question. Hence, it's neither Mutt's nor offlineimap's mishandling or misconfiguration per se.

The issue is discussed, in detail, and solved in the following blog-posts and git repository:

The solution

Essentially, a python script (provided below) that helps in deriving readable folder names, is fed into offlineimap's configuration file (that is offlineimaprc, as explained in OfflineIMAP Manual). In addition, an instructive line of code for the proper folder name translation (using the function(s) defined in the python script), i.e.,

# Name translation from UTF7 to UTF8
nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')

is added at offlineimap's configuration file under the section with the options for the remote repository.

Update, (April 2015)

Another rule is required for the reverse operation, see Folder filtering and Name translation. This instructions may be something like

# Name translation, reverse!
nametrans = lambda foldername: foldername.decode('utf-8').encode('imap4-utf-7')

In the mailboxes file, the one created by offlineimap, Greek names appear correctly. This resolves the issue inside mutt and folder names appear as intented (in this case, Greek names). enter image description here


Remaining issue?

However, the local repository folder names (directory names) are still in an unreadable state. I.e., the above shown Greek folder name (Υποτροφία), is actually, the directory &A6UDwAO,A8QDwQO,A8YDrwOx-. Does this mean that, folder name conversions take place after, and not during, the syncing of folder names and messages? Or, is it required to remove these directories (from the local repository) and force another syncing via offlineimap (taking care not to let the respective mailbox folders be removed from the remote repository)?


Python script to deal with international mailbox names (IMAP, UTF-7):

# vim:fileencoding=utf-8

r"""
Imap folder names are encoded using a special version of utf-7 as defined in RFC
2060 section 5.1.3.

From: http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html

5.1.3.  Mailbox International Naming Convention

   By convention, international mailbox names are specified using a
   modified version of the UTF-7 encoding described in [UTF-7].  The
   purpose of these modifications is to correct the following problems
   with UTF-7:

    1) UTF-7 uses the "+" character for shifting; this conflicts with
     the common use of "+" in mailbox names, in particular USENET
     newsgroup names.

    2) UTF-7's encoding is BASE64 which uses the "/" character; this
     conflicts with the use of "/" as a popular hierarchy delimiter.

    3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
     the use of "\" as a popular hierarchy delimiter.

    4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
     the use of "~" in some servers as a home directory indicator.

    5) UTF-7 permits multiple alternate forms to represent the same
     string; in particular, printable US-ASCII chararacters can be
     represented in encoded form.

   In modified UTF-7, printable US-ASCII characters except for "&"
   represent themselves; that is, characters with octet values 0x20-0x25
   and 0x27-0x7e.  The character "&" (0x26) is represented by the two-
   octet sequence "&-".

   All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
   Unicode 16-bit octets) are represented in modified BASE64, with a
   further modification from [UTF-7] that "," is used instead of "/".
   Modified BASE64 MUST NOT be used to represent any printing US-ASCII
   character which can represent itself.

   "&" is used to shift to modified BASE64 and "-" to shift back to US-
   ASCII.  All names start in US-ASCII, and MUST end in US-ASCII (that
   is, a name that ends with a Unicode 16-bit octet MUST end with a "-
   ").

    For example, here is a mailbox name which mixes English, Japanese,
    and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-

"""

import binascii
import codecs

# encoding
def modified_base64(s):
  s = s.encode('utf-16be')
  return binascii.b2a_base64(s).rstrip(b'\n=').replace(b'/', b',').decode('ascii')

def doB64(_in, r):
  if _in:
    r.append('&%s-' % modified_base64(''.join(_in)))
    del _in[:]

def encoder(s):
  r = []
  _in = []
  for c in s:
    ordC = ord(c)
    if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
      doB64(_in, r)
      r.append(c)
    elif c == '&':
      doB64(_in, r)
      r.append('&-')
    else:
      _in.append(c)
  doB64(_in, r)
  return (''.join(r).encode('ascii'), len(s))

# decoding
def modified_unbase64(s):
  b = binascii.a2b_base64(s.replace(b',', b'/') + b'===')
  return str(b, 'utf-16be')

def decoder(s):
  r = []
  decode = []
  for c in s:
    if c == b'&' and not decode:
      decode.append(b'&')
    elif c == b'-' and decode:
      if len(decode) == 1:
        r.append('&')
      else:
        r.append(modified_unbase64(b''.join(decode[1:])))
      decode = []
    elif decode:
      decode.append(c)
    else:
      r.append(c.decode('ascii'))
  if decode:
    r.append(modified_unbase64(b''.join(decode[1:])))
  bin_str = ''.join(r)
  return (bin_str, len(s))

class StreamReader(codecs.StreamReader):
  def decode(self, s, errors='strict'):
    return decoder(s)

class StreamWriter(codecs.StreamWriter):
  def decode(self, s, errors='strict'):
    return encoder(s)

def imap4_utf_7(name):
  if name == 'imap4-utf-7':
    return (encoder, decoder, StreamReader, StreamWriter)

codecs.register(imap4_utf_7)
Related Question