Postgresql – Hashtags and synonyms table design

database-designpostgresql

So what I'm trying to do is setup a tag and synonyms structure. Where you can assign a tag as a synonym to another tag.

E.g for tag: photographer

I'd want to add 'photography, photographers, photo' as synonyms for 'photographer'.

The way I've proposed is that all synoynms are created as tags & that I associate a tag with a synonym by doing a 1<->1 relationship on a second table.

Here is what I'm proposing:

Tag:

  • tag_id
  • title
  • More meta data fields here…

Synonyms:

In this table we can attach a tag as a synonym to another tag (e.g photo can be a synonym for photography)

  • tag_id (the tag which we're attaching a synonym to)
  • synonym_tag_id (the tag_id which will be a synonym)

So I can add many synonyms to one tag.

Is this solution viable? Could it be improved somehow? I will be writing this schema in postgres. Appreciate any tips 🙂

Use cases:

  1. Users of our app will be able to attach tags to there articles. If the user adds photopgraphy as the article title, our search will also show articles that have the photopgraphy synonyms attached as a tag.
  2. Note that we aren't using SQL for search, we're using elaticsearch. The sql will just be used for storing the data so it can later be imported into elasticsearch.

Best Answer

So - each tag can have one or more synonyms, and each synonym belongs to a particular tag. Seems straightforward. A few things that I'd suggest, are:

(1) Make sure that the synonyms as such are thoroughly CHECKed before they get INSERTed. I have seen situations where photography / Photography / photography{blank} was in a stored in a lookup table - although these 3 words have one and the same meaning for an end user (you could use UPPER() or LOWER() and RTRIM()/LTRIM() to prevent this). Write the CHECKs in your DDL code wherever possible.

(2) I'd use a UNIQUE constraint on the "description" column in the SYNONYMS table (eg if you don't want to have duplicate synonyms - because the FK alone will probably not prohibit this).

(3) Also: are you absolutely sure that a synonym that has a rather broad meaning (something like the name of a supertype) cannot be used for several tags? In your synonyms example photography, photographers, photo - suppose we have a TAG called "images", then "photo" could be a synonym for this, too (apart from being a synonym for "photographer"). You can still model this, but you'd probably need an "intersection" table for this purpose, as you suddenly end up with a many-to-many relationship between TAGS and SYNONYMS. (Please feel free to ask some more questions about this, if it is not clear to you).