Postgresql – “Wrong affix file formate for flag” while loading dictionaries from LibreOffice in PostgreSQL 9.3

full-text-searchpostgresqlpostgresql-9.3

Trying to get good TEXT SEARCH results out of PostgreSQL in English, German and Dutch made that I downloaded proofing tools of LibreOffice (tried of version 4.3 and 3.3) and iSpell. I placed the files in the "tsearch_data" dir and renamed them so PostgreSQL would find them. I now have:

en_us.dict, en_us.affix, nl_nl.dict, nl_nl.affix, de_de_frami.dict, de_de_frami.affix

All of them give the same sort of errors while creating a dictionary in PostgreSQL 9.3 like so:

CREATE TEXT SEARCH DICTIONARY test_ispell (template = ispell, Dictfile = en_us, affFile = en_us, stopwords = english);
ERROR:  wrong affix file format for flag
CONTEXT:  line 2428 of configuration file "/usr/share/postgresql/9.3/tsearch_data/en_us.affix": "COMPOUNDMIN 1"

ERROR:  wrong affix file format for flag
CONTEXT:  line 2533 of configuration file "/usr/share/postgresql/9.3/tsearch_data/en_us.affix": "SFX 123 N 1"

ERROR:  invalid byte sequence for encoding "UTF8": 0xc4 0x62
CONTEXT:  line 18 of configuration file "/usr/share/postgresql/9.3/tsearch_data/de_de_frami.dict"

ERROR:  wrong affix file format for flag
CONTEXT:  line 604 of configuration file "/usr/share/postgresql/9.3/tsearch_data/dutch.affix": "SFX Na N 1"

It got pointed out to me that the first error appears to be a wrongful one as can be found in its man page: http://linux.die.net/man/4/hunspell

COMPOUNDMIN num
   Minimum length of words used for compounding.  Default value is 3 letters.

Update: COMPOUND Hunspell functions can can't work in PostgreSQL see: http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

The SFX syntax, is beyond me. The invalid byte sequence must be fixable as well I presume if I somehow manage to tell PostgreSQL the file type, yet I have no clue how to do that.

Anyone got any idea how to load these files?

Update:

The Dutch dictionaries provided with Debian still produce errors this is part of the Myspell package, and I can't get it to work. Anyone any success with that?

For now I will try and file a bug report to Debian since it is a function that should be working.

Best Answer

So I found the answer for Debian at least. Hunspell is the successor of Myspell, and they ship with your distribution (in my case Debian). These are the dictionaries you can use. You can install these by executing

apt-get install hunspell-en-us

These dictionaries are installed in /usr/share/hunspell or /usr/share/myspell. The PostgreSQL dictionaries that are automatically generate out of these file when installing through apt-get are located in /var/cache/postgresql/dicts. Furthermore apt-get automatically creates links to the files generated in you /usr/share/postgresql/VERSION/tsearch_data dir. In short you are ready to use them.

If you want to manually install hunspell or myspell dictionaries you can place them in the /usr/share/hunspell or /usr/share/myspell dirs. And execute

pg_updatedicts

This will generate PostgreSQL compatible dictionaries in

/var/cache/postgresql/dicts

and you will have to link them yourself /usr/share/postgresql/VERSION/tsearch_data

ln –s /var/cache/postgresql/dicts/de_de.affix           \
  /usr/share/postgesql/VERSION/tsearch_data/            \

ln –s /var/cache/postgresql/dicts/de_de.dict            \
  /usr/share/postgesql/VERSION/tsearch_data/            \