Table design for a dictionary that can have words with many different spellings

database-design

I'm working on a small, personal dictionary database in Microsoft Access (the 2013 version). There are a lot of words in English that have two or even more spellings. Realistically speaking though, there are not that many words with three, let alone, four spellings. Nevertheless, they do exist. Examples include aerie/aery/eyrie/eyry (a word with four spellings) and ketchup/catsup/catchup (a word with three spellings). Not to mention that English is literally rife with words that have two spellings. Everybody knows that (the differences between the English and British spelling systems come immediately to mind). So, I need to design my tables in such a way that there are no significant flaws with the design. I'm going to explain step by step what the database should look like and introduce the problems I have found with my current design along the way. So, here we go.

All words, obviously, should be stored in the same table. And I'm not going to include irrelevant aspects of the design such as other columns that might be part of the table (in reality, the database is much more complex). Let's focus on the most important parts. Here's what the Words table with some pre-filled sample data will look like:

+---------+-----------+
| word_id | word      |
+---------+-----------+
|       1 | ketchup   |
|       2 | catsup    |
|       3 | catchup   |
|       4 | moneyed   |
|       5 | monied    |
|       6 | delicious |
+---------+-----------+

To keep track of a group of words that are the same, but just have different spellings, it is probably wise to choose one of them as the main word and the other ones as its child words. Here's the diagram to show you how I envision that (here, ketchup and moneyed are main words, all the others child words):

enter image description here

All this information will be placed in a new table which we shall call the Alternative Spellings table (The columns word_id and alt_spell_word_id are going to be part of the table's compound primary key):

+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
|       1 |                 2 |
|       1 |                 3 |
|       4 |                 5 |
+---------+-------------------+

Here's how all this looks in Access's Relationships panel (notice that I have enforced referential integrity between the word_id column of the Words table and the word_id column of the Alternative Spellings table and checked off the Cascade Delete Related Records option):

enter image description here

Although straight-forwardly simple, that's the only design I've been able to come up with so far. And I think that will basically do it. This is as simple as it gets. The problem with this design, however, is threefold:

1: This is not a serious problem, but I'd still like to hear your thoughts anyway. Every time I'm making a lookup of a word to see it in the Word Details form, I have to go through the entire Alternative Spellings table to see if it has other spellings associated with it or if it is a child word. So, I'd have to search both the word_id and alt_spell_word_id columns. And this process will be talking place for each and every word in the database every time I want to check the details of it. One possible solution is in the Words table to create an additional Boolean column that will keep track of whether a word has alternative spellings. This will indicate if we should scan the Alternative Spellings table at all when opening it up in the Word Details form. Here's what this would look like:

+---------+-----------+------------------+
| word_id | word      | has_alt_spelling |
+---------+-----------+------------------+
|     101 | ketchup   | yes              |
|     102 | catsup    | no               |
|     103 | catchup   | no               |
|     104 | moneyed   | yes              |
|     105 | monied    | no               |
|     106 | delicious | no               |
+---------+-----------+------------------+

I think that's a good design, but, as I said, I'd very much like to hear what you've got to say about this: a problem/not a problem? Your solution?

2: The other problem, which is of more serious nature, has to do with primary keys. word_id and alt_spell_word_id should be part of a compound primary key, of course. We don't want duplicate rows in the table. We all understand that. Not a problem. But here's what happens when we try to enforce referential integrity between the Words table and Alternative Spellings table (see the screenshot above). Everything is fine except that now we can associate a word with the id of a nonexistent word and the database is not going to complain because, for example, the last record in word_id has 4 in it, which is true, we do have a record with the id of 4 in the Words table, but there is no way to impose any kind of constraint on the alt_spell_word_id column. We can put any kind of nonsense in there:

+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
|       1 |                 2 |
|       1 |                 3 |
|       4 |                 5 |
|       4 |             34564 |
+---------+-------------------+

I think that breaks the referential integrity of the database schema and thus is a serious problem. What kind of solution would you like to offer?

3: Another problem with this design is that if we want to delete a certain word from the Words table, the deletion will cascade through the Alternative Spellings table and delete all related records there, which is perfectly fine, but here's the catch: since we agreed that different words in the database can actually be just one word with different spellings, they all should be deleted along with the main word. But that's not going to happen as things stand at the moment. For instance, if I were to delete ketchup in the Words table, all related records in the Alternative Spellings table would be deleted. Fine. But we'd really get two dangling records, catchup and catsup—they can't exist on their own because they are part of the group where ketchup is the main word, but now it has been deleted:

+---------+-----------+
| word_id | word      |
+---------+-----------+
|       2 | catsup    |
|       3 | catchup   |
|       4 | moneyed   |
|       5 | monied    |
|       6 | delicious |
+---------+-----------+

+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
|       4 |                 5 |
+---------+-------------------+

Here's the actual database (simplified version) if you want to play with it.

Thank you all in advance.

Best Answer

The way I would tackle this is by asking, how the word and description are related to each other. In this case multiple words have a single description (definition, pronunciation, part of a sentence, etc.).

I would set up a single description table with the neccesary information on it, which then uses the "Description_Num" as a primary key. Then for the Word table I would set up three columns, an ID as a Primary Key, the word its self, and the description_num to reference as a Foreign Key.

You can then use the description_num to combine these, and use your DBMS functions to find information like the number of different spellings of a word.

I think this would result in maintaining referential integrity in tables as long as you are using the description_num.