PostgreSQL – Using GIN Index on Integer Array for Tag Sets

arraydatabase-designindexpostgresql

In PostgreSQL, what are the pros and cons of using a GIN index on a column of type integer[] to model sets of tags, instead of doing a relation table?

I am trying to design a simple data model with a table called post where each post can have a set of tags, similarly to Instagram or StackExchange.
It seems that I would be able to search for a post that contains a set of tags without having to join two tables, but I won't be able to easily get the list of tag strings for one specific post.

Best Answer

I think you already more or less answered your own question.

While a GIN index can speed up searching inside an array quite significantly, you have to be aware that relational database engines are designed to do joins very efficiently. Also, when you need the actual tags, you will have to join the tag table regardless of you use the array approach or not.

Furthermore, storing the tags as an array of integer makes enforcing referential integrity (that it is impossible to insert a nonexistent tag ID into the array) quite complicated. The only way I can imagine this is creating a trigger, which never makes changing the data faster.

The best way to choose among these alternatives is, though, to create a sufficiently big amount of test data and measure the performance yourself.