I would go for option 1 (two tables).
If you only plan to have 1000 content items (and in 3 years, not now), I don't think you will have many performance problems unless each content item is tagged with a dozen tags where most are unique to that item. And even then I don't think you'll see much of a performance problems. It also simplifies your structure if you have 1 tag table for every content type and then you decide to add/remove/change content types later.
If you're really not sure, do you have the time/resources to build two sample DBs, one based on each structure, populate them with what you think is a realistic load (you can write a script/program to generate & insert test data, right? with "boring" tags and content such as "content_1..content_2000 and tag_1... tag_5000) and actually test which performs better?
Your design looks a bit like the "supertype/subtype" pattern. Search for that and for "table inheritance". It needs quite a lot of work to be able to enforce integrity constraints though.
You are missing a generic Fruit
table (that's the "supertype") and a FruitType
table to store the alllowed fruit types:
FruitType
fruit_type PK
Fruit
fruit_type PK, FK -> FruitType (fruit_type)
fruit_id PK
Then the 3 (or 4 or more) tables would be (the "subtype" tables):
Apple
fruit_type
fruit_id PK
(fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
CHECK (fruit_type = 'Apple')
Banana
fruit_type PK
fruit_id PK
(fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
CHECK (fruit_type = 'Banana')
Orange
fruit_type PK
fruit_id PK
(fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
CHECK (fruit_type = 'Orange')
And any other table can reference the Fruit
table:
FruitPack
fruitpack_id PK
destination
FruitPackFruits
fruitpack_id FK -> FruitPack (fruitpack_id)
fruit_id
fruit_type
(fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
It doesn't look very nice and one column in every "fruit" table seems redundant as it has one and only one allowed value. And every time you need to add a new fruit (say Cherry), you have to add a row in the table FruitType
and a new table (Cherry
), similar to the other ones. So, it works better if your design is more or less stable. If you find that you may need to add a new "fruit" every few days or if you have a thousand (or more!) different fruits, it's not the best way.
On the other hand, it enforces integrity and you can't insert cherries into the Bananas or oranges into the Apples.
Best Answer
I think you already more or less answered your own question.
While a GIN index can speed up searching inside an array quite significantly, you have to be aware that relational database engines are designed to do joins very efficiently. Also, when you need the actual tags, you will have to join the
tag
table regardless of you use the array approach or not.Furthermore, storing the tags as an array of
integer
makes enforcing referential integrity (that it is impossible to insert a nonexistent tag ID into the array) quite complicated. The only way I can imagine this is creating a trigger, which never makes changing the data faster.The best way to choose among these alternatives is, though, to create a sufficiently big amount of test data and measure the performance yourself.