I have a database in PostgreSQL modelling a Japanese->English dictionary. The tables I am interested in are listed as follows, where bold represents PK and italic FK:
- entries(entry_id int, keyword varchar, reading varchar, meaning varchar)
- entries_kanji(kanji char, entry_id int) where entry_id references entries.entry_id
The idea is that entries.keyword
is a string containing several kanji. For each of these kanji, there will be an appropriate row in entries_kanji linking back to the entry. This allows me to quickly find all the entries whose keywords contain specific kanji, and by computing the intersection of several queries on this table, identify a word by its kanji alone.
At present, I am using a script external to the database to generate the entries_kanji table by iterating over every row in entries, extracting the kanji from each keyword and inserting a row into entries_kanji. While this works, it is inconvenient to have to run this script every time a new entry is added to the dictionary.
My question is: is there a way to achieve the above using an index alone? I know PostgreSQL allows for indexes to be created based on expressions, but I don't know how to achieve what I described above using this. Or is there some other feature of PostgreSQL that will allow me to do away with the script?
Best Answer
There is a way of implementing what you want by using Regular Expressions and a couple of triggers.
To set the stage, I assume your tables are created this way:
I assume that your definition of Kanji can be translated to a Regular Expression. To begin with, I use the definition of Kanji found on Localizing Japan. That is, we can define one funcion to check whether a character is a Kanji by doing this:
The
~
operator meanstext_on_the_left
matches theregular expression on the right
. See Pattern Matching in PostgreSQL.The result of executing the following query
is
true
.Next step is, given a sentence (a string where there are several Kanji chars), we SELECT all of them. This can be done with the following query:
The function
regexp_matches
returns an array containing all the parts of the passed string that match the regular expression. This array is converted into aset of rows
by means of theunnest
function.Once we have this, we create one trigger function to be associated with INSERTS, UPDATES and DELETES into the
entries
table.This function would take care:
And we associate this trigger function with one trigger (I have chosen it to be an AFTER trigger, in case some other BEFORE triggers do some data manipulation; otherwise, it could also be a BEFORE trigger):
With this setup, you can now insert data into
entries
, and get the kanji inentries_kanji
:and see that you get the proper information in the
entries_kanji
table:You get
You can also check that the trigger takes care when you
UPDATE
orDELETE
entries.The advantages of triggers versus materialized views are that
The disadvantages are:
Depending on your use case, you can decide which solution is more appropriate.