I'm currently building a forum component for a larger application and I'm considering different approaches to certain parts of the database schema. In particular, I am considering representing topics and posts in a single table. While I view topics and posts as practically the same, I feel a bit apprehensive as this may make things less flexible in the future.
When topics of a particular forum are queried, the title and first post will be shown as well as some of the user information (basically the name and avatar). In this application, there are various attributes that are used by both topics and posts except for views and replies; and perhaps title, and forum_id(forum_id because that would mean potentially hundreds of records need to be affected if a topic is changed to another forum as opposed to changing the forum_id attribute in the topic relation).
The tables look something like what I have below here:
TOPIC POST
topic_id poster_id
forum_id topic_id
poster_id content
title upvote
views dnvote
replies closed
post_id deleted
last_edited
last_editor
parent_id
content
post_id
Doing it this way, using table inheritance, generating the posts in the topic would require a 4-table join via TOPIC, POST, USER, and TOPIC_TYPE.
On the other hand, if I decide to take the single table approach, should I simply leave the views, replies, title, and forum_id attributes as null if the topic_type is a regular post? (topic_type references an appropriate icon for the type of topic displayed, and will be used for statistics and etc.)
Best Answer
A a rule of thumb: don't pre-optimize for performance. I think a lot of developers assume that joins are inefficient and they don't trust DBMS to do what it's built to do.
Start with a properly normalized design. Make sure your indexes and queries are optimized for your particular balance of reads and writes.
If and when you start to find that performance can't keep up with the best hardware you can afford, then start to think about denormalizing.
If you denormalize early you are just setting yourself up for maintenance headaches down the road.
More Specifically...Looking at your suggested table layouts, I would suggest that you're trying to make
TOPIC
do too much. Anything which could appear inPOST
(e.g.poster_id
) almost certainly doesn't belong inTOPIC
. I suggest you adjust your thinking a little bit. I get the impression you are thinking very much about what topics and posts are going to look like on the page. This may be leading you to thinking of topics as a small super-set of posts, whereas they are probably rather more like subject headings. The fact that you plan on displaying the first post under each subject heading along with the heading isn't a good reason to co-mingle posts and headings.I think you may want to rethink some of your cumulative total columns too. Thinks like up and down votes may need to be tracked in their own tables. You may need to do this to keep people from up or down voting repeatedly and to allow people to rescind their votes. Similarly, you may want to know all editors, not just the last editor.