PostgreSQL – How to Model Database Structure for Storing Words Information

database-designpostgresql

I have an application where the user can click on any word in a text, which will bring back related information about the base word and the specific inflection (word form) of the word clicked. I've tried representing this in a multitude of ways, but now that I'm close to launch, I'm worried as hell about the best way to represent it (this is my first ever project with programming, so I have no idea what I'm in for).

Considerations

The way I had originally set this up was that there was a base word table that contained the base word, part of speech, and then an array of word forms for each base word.

I then learned that this was a bad idea in terms of normalization, and I also realized that I needed to store specific inflection information for the words (mood, tense, person, etc.), so I rearranged it.

In the second iteration, I had three tables, a base_word table that had the base_word and the part of speech, a word table that had the word and its inflection information, and then a “join” table between them that contained the base_word_id and the word_id.

However, I've been re-thinking this setup, because semantically speaking, a base word is also a word, just a specific type of word. So in this case, I feel like a “self-join” table might be most appropriate, although I could be completely overthinking this. In the “self-join” table, there would be an is_baseword boolean attribute, which would theoretically allow all the inflection columns to store NULL marks if it was set to true. Is this a better idea?

Note: I found these two similar questions which didn't seem to completely answer my questions, so I'm posting my twist on these here:

Best Answer

I believe your design for a one to many join with base_word in one table and word forms in another is fine. The reasons I make this judgement are:

The base_word, as an entity, is sufficiently different to word_forms as to represent a different entity. For instance, if you were to include the words origin language, this would also be only applicable to the base word entity, not its word forms.
Storing the data in the same table will have implications on the size of the tables due to having sparse columns with attributes that are associated with base_word OR word_form only, meaning they are NULL or empty for the others. You point this out with the is_baseword and inflection columns. Having a bunch of NULL values on a row which will never be changed is usually a reason to use the EAV model or split into different entities.
Performance considerations for queries should be taken into account. With the additional size consideration a self join to a sparsely populated table will likely not be as performant. You will probably find yourself indexing a single table for base_words OR word_forms separately. The indexes will be larger and more difficult to maintain for inserts/deletes etc. Better to have indexes on smaller tables and to have as few as possible on a table.

Don't get too concerned with perfect normalization. While a perfectly normalized system may adhere to strict information science principles, it is often better in the real world to have a slightly denormalized database for other reasons.

Related Solutions

Should I create a new table for store information of an associated object with object need to create

If the number of questions for each topic on each course varies from one General Exam to another, then yes, you want a structure like this (which is pretty much what you have):

NumberQuestions:
GE_ID int FK
Course_ID int FK
Topic_ID int FK
MaxQuestions int

Database Design – Creating Simple Yet Multilingual Flashcards

This is a good use for a parent-child model. This can be easily accessed with a join query. Include the language(s) you want to work with in the where clause of the query.

This model below handles multiple translations between multiple languages. If you want flashcards for one language as in your example above, remove the lang column from the flashcards table.

CREATE TABLE flashcards (
    entry_id     INT NOT NULL auto_increment,
    entry        VARCHAR(50) NOT NULL,        
    lang         VARCHAR(2) NOT NULL,
    PRIMARY KEY (entry_id),
    UNIQUE KEY `flashcard_entry` (entry, lang)
);

CREATE TABLE flashcard_translations (
    entry_id     INT  NOT NULL,
    lang         VARCHAR(2) NOT NULL,
    translation  VARCHAR(50) NOT NULL,
    displayed    INT DEFAULT 0,
    known        INT DEFAULT 0L,
    lastKnown    DATE NULL,
    PRIMARY KEY (entry_id, lang),
    FOREIGN KEY (entry_id) REFERENCES flashcards (entry_id),
);

Depending on your requirements, you may not want the unique key.

Considerations

Best Answer

Related Solutions

Should I create a new table for store information of an associated object with object need to create

Database Design – Creating Simple Yet Multilingual Flashcards

Related Question