MySQL Database Design – Designing a Relational Database with Interrelated Words

database-designMySQL

I'm developing a simple language game about antonyms, synonyms and 'similar speaking' words.

If you play in the 'Synonyms screen', the game will show you two synonyms and two unrelated words and you have to guess wich are the synonyms.
If you play in the 'Antonyms screen', the game will show you two antonyms and two unrelated words and you have to guess wich are the antonyms.
If you play in the 'Speaking screen', the game will show you two 'similar speaking' words, will speak one of each and you have to guess wich is.

My problem is that you can have more than one synonyms, like:

angry: enraged, furious, irated, offended, etc..
love: affection, passion, devotion, etc..

My first idea is something like this one:

Table Synonyms:

+-----------+-------------------+  
|   word    |     related       |  
+-----------+-------------------+  
| angry     |   enraged         |  
| angry     |   furious         |  
| angry     |   irated          |  
| angry     |   offended        |  
| enraged   |   furious         |  
| enraged   |   irated          |  
| enraged   |   offended        |   
| furious   |   irated          |  
| furious   |   offended        |   
| irated    |   offended        |   
| love      |   affection       |  
| love      |   passion         |  
| love      |   devotion        |  
| affection |   pasion          |  
| ....      |   ....            |  
+-----------+-------------------+

And build the antonyms and speaking tables in the same way.

But it seems like a lot of work and it's difficult to add/remove one synonym.

Also, I don't know where I can put the information about how to speak every word. Maybe another table like:

+-----------+-------------------+  
|   word    |  pronuciation     |  
+-----------+-------------------+  
| angry     |   ˈaNGgrē         |  
| enraged   |   enˈrājd         |  
| ....      |   ....            |  
+-----------+-------------------+

Best Answer

The associative entity idea is a good way to handle this, but it has a drawback that you might want to consider.

Something that hasn't been addressed in some other answers is how you're going to maintain all of this data. It's going to be a lot of work, particularly if you want to be able to get from one word to any of its synonyms (or antonyms) without knowing which of your words you're starting from. Consider your example of {angry, enraged, furious, irritated, offended}. That's a set of five synonyms. If you used associative entities you'd have these pairs: angry=enraged, angry=furious,... etc. You'd also have to do them the other way around: enraged=angry, furious=angry, and so forth.

Five synonyms isn't a lot. However, if you make an associative table for 5 synonyms you're going to have 10 entries. For 6 synonyms you'll have 15 and for 7 synonyms you'll have 21. You can see how this might get out of hand.

Alternative: Association via a root The good news is that there's another way to do this. If you pick one of your synonyms and make it the base or root word (say in your case "angry") then you can have one record for each of these words in one table (five synonyms = five records, include "angry" in your list of words) and then a second table containing just the roots (for example "angry")

The key (pun intended) is to have the list of words point at their base or root word. This would be via a foreign key.

So in your word table, you might have something like this:

angry = angry
enraged = angry
furious = angry
irritated = angry
offended = angry

Now, you might want to use a meaningless numerical key instead of "angry" in the second column in each of the above. That might make things easier for a variety of practical reasons. In that case you'd have a second table with just the base words with their meaningless numerical keys, for example:

123, angry

So your word table looks more like this:

angry, 123
enraged, 123
furious, 123
irritated, 123
offended, 123

So how do you find synonyms for a word?

That's a simple process, but it's a two step* process:

Look up the root key for the given word. Say you want to find synonyms for "furious". You look it up and find the root is number 123.
Look for all the words that have the same root key as your word, these are your synonyms.

The advantage of doing it this way is that it saves you a ton of data maintenance. You only have to maintain one record for each word, plus one for each group of synonyms. That's a lot fewer than combinations of all synonyms, which is what you're looking at if you do this with an associative entity.

*Note: you don't actually have to do this in two steps. You can write a SQL which combines all of this into a single statement using a self join.

Related Solutions

Mysql – How to make a trigger that will compare input with a value of other table

DELIMITER $$  
CREATE TRIGGER Bookings_BeforeInsert BEFORE INSERT ON Bookings
FOR EACH ROW  
BEGIN  
    DECLARE found_flight,dummy,baddata INT;
    DECLARE dt DATETIME;
    SET baddata = 1;
    SELECT COUNT(1) INTO found_flight FROM Flights
    WHERE flight_number = NEW.fk_flight_number;
    IF found_flight > 0 THEN
        SELECT date_of_departure INTO dt FROM Flights
        WHERE flight_number = NEW.fk_flight_number;
        IF date_of_reservation < date_of_departure THEN
            SET baddata = 0;
        END IF;
    END IF;
    IF baddata = 1 THEN  
        SELECT CONCAT('Cannot Insert This Because Age ',NEW.age,' is Invalid')  
        INTO dummy FROM information_schema.tables;
    END IF;  
END; $$  
DELIMITER ;

The IF statement IF baddata = 1 THEN was made a bogus message on purpose. That's because MySQL error handling for Stored Procedures is not designed properly. This crazy workaround I got from Pages 254-256 of the book MySQL Stored Procedure Programming

enter image description here

under the subheading 'Validating Data with Triggers':

I have written about this style of Trigger Programming in my past posts

Apr 25, 2011 : Trigger in MySQL to prevent insertion
Dec 23, 2011 : check constraint does not work?

Splitting up tables with similar structures since they contain unrelated data? (huge numbers)

Your issues are probably less about performance, assuming reasonable equipment and a good indexing structure, and more about other impacts that you face.

Approach 1 means some additional programming complexity to deal with filtering rows in each table to restrict access to a single gameworld. It also means that if you need to restore a gameworld, you would have to (a) restore all gameworlds (since they are in the same database) or (b) restore the one database to a recovery database and then script the appropriate gameworld data from Recovery to your Gameworlds database.

Approach 2 means that you need to be able to manage schema changes across many databases. This can be done by scripting as needed. It is also easier to choose to have different schemas for some gameworlds, but that adds management overhead. However, if a a single gameworld crashes, you can restore the latest good backup(s) and get it going again.

Some years ago some Microsoft folks wrote up their take on Multi-Tenanted databases. See: http://msdn.microsoft.com/en-us/library/aa479086.aspx

Having more tables should not, in itself, cause performance problems within the size range you describe.

Choosing your approach should take into account leveraging your strengths so that you have a system you (and your project members) can readily manage.

Best Answer

Related Solutions

Mysql – How to make a trigger that will compare input with a value of other table

Splitting up tables with similar structures since they contain unrelated data? (huge numbers)

Related Question