Trying to get good TEXT SEARCH results out of PostgreSQL in English, German and Dutch made that I downloaded proofing tools of LibreOffice (tried of version 4.3 and 3.3) and iSpell. I placed the files in the "tsearch_data" dir and renamed them so PostgreSQL would find them. I now have:
en_us.dict, en_us.affix, nl_nl.dict, nl_nl.affix, de_de_frami.dict, de_de_frami.affix
All of them give the same sort of errors while creating a dictionary in PostgreSQL 9.3 like so:
CREATE TEXT SEARCH DICTIONARY test_ispell (template = ispell, Dictfile = en_us, affFile = en_us, stopwords = english);
ERROR: wrong affix file format for flag CONTEXT: line 2428 of configuration file "/usr/share/postgresql/9.3/tsearch_data/en_us.affix": "COMPOUNDMIN 1" ERROR: wrong affix file format for flag CONTEXT: line 2533 of configuration file "/usr/share/postgresql/9.3/tsearch_data/en_us.affix": "SFX 123 N 1" ERROR: invalid byte sequence for encoding "UTF8": 0xc4 0x62 CONTEXT: line 18 of configuration file "/usr/share/postgresql/9.3/tsearch_data/de_de_frami.dict" ERROR: wrong affix file format for flag CONTEXT: line 604 of configuration file "/usr/share/postgresql/9.3/tsearch_data/dutch.affix": "SFX Na N 1"
It got pointed out to me that the first error appears to be a wrongful one as can be found in its man page: http://linux.die.net/man/4/hunspell
COMPOUNDMIN num
Minimum length of words used for compounding. Default value is 3 letters.
Update: COMPOUND Hunspell functions can can't work in PostgreSQL see: http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
The SFX syntax, is beyond me. The invalid byte sequence must be fixable as well I presume if I somehow manage to tell PostgreSQL the file type, yet I have no clue how to do that.
Anyone got any idea how to load these files?
Update:
The Dutch dictionaries provided with Debian still produce errors this is part of the Myspell package, and I can't get it to work. Anyone any success with that?
For now I will try and file a bug report to Debian since it is a function that should be working.
Best Answer
So I found the answer for Debian at least. Hunspell is the successor of Myspell, and they ship with your distribution (in my case Debian). These are the dictionaries you can use. You can install these by executing
These dictionaries are installed in
/usr/share/hunspell
or/usr/share/myspell
. The PostgreSQL dictionaries that are automatically generate out of these file when installing through apt-get are located in/var/cache/postgresql/dicts
. Furthermore apt-get automatically creates links to the files generated in you/usr/share/postgresql/VERSION/tsearch_data dir
. In short you are ready to use them.If you want to manually install hunspell or myspell dictionaries you can place them in the
/usr/share/hunspell
or/usr/share/myspell
dirs. And executeThis will generate PostgreSQL compatible dictionaries in
and you will have to link them yourself
/usr/share/postgresql/VERSION/tsearch_data