Topics and posts represented by one table or two

database-designperformancequery-performance

I'm currently building a forum component for a larger application and I'm considering different approaches to certain parts of the database schema. In particular, I am considering representing topics and posts in a single table. While I view topics and posts as practically the same, I feel a bit apprehensive as this may make things less flexible in the future.

When topics of a particular forum are queried, the title and first post will be shown as well as some of the user information (basically the name and avatar). In this application, there are various attributes that are used by both topics and posts except for views and replies; and perhaps title, and forum_id(forum_id because that would mean potentially hundreds of records need to be affected if a topic is changed to another forum as opposed to changing the forum_id attribute in the topic relation).

The tables look something like what I have below here:

TOPIC            POST           
topic_id         poster_id   
forum_id         topic_id 
poster_id        content 
title            upvote
views            dnvote
replies          closed
post_id          deleted
                 last_edited
                 last_editor
                 parent_id
                 content
                 post_id

Doing it this way, using table inheritance, generating the posts in the topic would require a 4-table join via TOPIC, POST, USER, and TOPIC_TYPE.

On the other hand, if I decide to take the single table approach, should I simply leave the views, replies, title, and forum_id attributes as null if the topic_type is a regular post? (topic_type references an appropriate icon for the type of topic displayed, and will be used for statistics and etc.)

Best Answer

Generally...

A a rule of thumb: don't pre-optimize for performance. I think a lot of developers assume that joins are inefficient and they don't trust DBMS to do what it's built to do.

Start with a properly normalized design. Make sure your indexes and queries are optimized for your particular balance of reads and writes.

If and when you start to find that performance can't keep up with the best hardware you can afford, then start to think about denormalizing.

If you denormalize early you are just setting yourself up for maintenance headaches down the road.

More Specifically...

Looking at your suggested table layouts, I would suggest that you're trying to make TOPIC do too much. Anything which could appear in POST (e.g. poster_id) almost certainly doesn't belong in TOPIC. I suggest you adjust your thinking a little bit. I get the impression you are thinking very much about what topics and posts are going to look like on the page. This may be leading you to thinking of topics as a small super-set of posts, whereas they are probably rather more like subject headings. The fact that you plan on displaying the first post under each subject heading along with the heading isn't a good reason to co-mingle posts and headings.

I think you may want to rethink some of your cumulative total columns too. Thinks like up and down votes may need to be tracked in their own tables. You may need to do this to keep people from up or down voting repeatedly and to allow people to rescind their votes. Similarly, you may want to know all editors, not just the last editor.