PostgreSQL Performance – Storing Email Inbox Information

performanceperformance-tuningpostgresqlscalability

I am working on a webmail client. I want to store the remote mailboxes information in a Postgres database. As a general rule of thumb, most IMAP servers will have a similar default folder structure like so:

  • INBOX
    • Drafts
    • Outbox
    • Sent Messages
    • Trash
    • Junk Mail

etc, etc.

As I add users to my system, this information will be repeated across most users. So here's my question:

Considering these are small-ish strings, am I better off to just store all the mailbox information in one row (ie: mailbox name, attributes, delimiter, parentId, hasChildren) or should I store the mailbox names in a separate table and join on a foreign key?

Right now I'm thinking the former as the cost of storing the extra information is probably less than the cost of throwing more computing power at the database to maintain the same performance. But I am a novice when it comes to database design and I want my application to scale well.

Best Answer

Joining the mailbox table to a "FolderNames" table using a foreign key will be problematic if any particular user wants to change the name of any given folder, without changing that name for everyone else. You could get around this by having a nullable CustomFolderName column in the mailbox table.

Depending on the expected number of accounts, storing the name of the mailbox in the mailbox table is probably not a deal-breaker for performance, unless perhaps there are details you haven't shared with us. When you say these are "smallish strings", don't specify the column as having a maximum length of 1000 characters if your app will limit the folder names to 15 characters.

Overthinking optimization early on can be burdensome to the development effort. Focus on the database, just don't go entirely overboard at the very earliest stages of development. You're quite likely to change the structure as development progresses anyway. Ensuring you use the correct data types as appropriate is going to be far more important. For instance, please don't store dates in string columns. Use a date column. If there is going to be a time element, use a date/time column. Don't use 128 bit integers where 32 bit will suffice.