Generally...
A a rule of thumb: don't pre-optimize for performance. I think a lot of developers assume that joins are inefficient and they don't trust DBMS to do what it's built to do.
Start with a properly normalized design. Make sure your indexes and queries are optimized for your particular balance of reads and writes.
If and when you start to find that performance can't keep up with the best hardware you can afford, then start to think about denormalizing.
If you denormalize early you are just setting yourself up for maintenance headaches down the road.
More Specifically...
Looking at your suggested table layouts, I would suggest that you're trying to make TOPIC
do too much. Anything which could appear in POST
(e.g. poster_id
) almost certainly doesn't belong in TOPIC
. I suggest you adjust your thinking a little bit. I get the impression you are thinking very much about what topics and posts are going to look like on the page. This may be leading you to thinking of topics as a small super-set of posts, whereas they are probably rather more like subject headings. The fact that you plan on displaying the first post under each subject heading along with the heading isn't a good reason to co-mingle posts and headings.
I think you may want to rethink some of your cumulative total columns too. Thinks like up and down votes may need to be tracked in their own tables. You may need to do this to keep people from up or down voting repeatedly and to allow people to rescind their votes. Similarly, you may want to know all editors, not just the last editor.
If you just want a readable copy of the data in table A
, then there are ways to make this efficient without worrying about whether the copy is in the same database or a different database. Or a different server, for that matter, as long as you can reliably point your application differently depending on whether it is reading or writing, and as long as there isn't a bunch of referential integrity tied to A
.
Let's keep things simple and have everything in the same database. Create two new schemas:
CREATE SCHEMA fake AUTHORIZATION dbo;
CREATE SCHEMA shadow AUTHORIZATION dbo;
Now create a copy of A
in the dbo
and shadow
schemas:
SELECT * INTO shadow.B FROM dbo.A;
SELECT * INTO dbo.B FROM dbo.A;
(Then add all of the appropriate indexes, etc. to both tables.)
Now, create a stored procedure that does the following:
TRUNCATE TABLE shadow.B;
INSERT shadow.B SELECT * FROM dbo.A;
-- optionally add an UPDATE STATISTICS command here
BEGIN TRANSACTION;
ALTER SCHEMA fake TRANSFER dbo.B;
ALTER SCHEMA dbo TRANSFER shadow.B;
COMMIT TRANSACTION;
ALTER SCHEMA shadow TRANSFER fake.B;
-- optionally truncate the now stale copy, though I keep
-- this around for debugging / comparisons:
TRUNCATE TABLE shadow.B;
Then schedule it to run in a job (frequency will depend on your tolerance for how long this process takes and how fresh the copy of the data needs to be). You can always call it manually if you want it to be refreshed in between runs.
As I mentioned above, unless you use the Auto-Close feature, the overhead of accessing a table in a different database on the same instance is negligible compared to accessing the same table in the same database. So the process or any related efficiency doesn't really change that much if you put B
in a different database (just an extra prefix on the INSERT/SELECT
, and the pointing of the application as mentioned above). The process also doesn't change that much if you move the copy to a different server (though linked servers, distributed transactions etc. make this a little more cumbersome). You will see the process might be a little slower across servers, but users shouldn't notice unless they also now have to join data between the two servers - if they can query the other server independently to pull data from B
they won't be able to tell the difference, unless the other server is severely underpowered.
If you have referential integrity you may find that you will need to drop/re-create and/or disable/re-enable constraints in order to schema transfer (I've almost always used this trick for relatively static lookup data and, as a copy primarily used for reporting, did not have DRI). Also both copies of A
will have their own statistics, so half the time you'll have one set of statistics, and half the time you'll have the other set. I haven't done extensive testing of what happens with query plans etc. after the schema transfer.
Of course if you store B in a different database then perhaps the DRI issues etc. are not all that important anyway, but I thought I would mention them.
I have, however, blogged a bit more about how I arrived at this technique:
http://www.sqlperformance.com/2012/08/t-sql-queries/t-sql-tuesday-schema-switch-a-roo
Best Answer
You could always do something like this:
Create a
users
table to store all of your user info:You state that you might need to apply this type of tracking for multiple applications. So you can create an
application
table to store the app name and any details.Next create a
categories
table to store the info on each of the categories that you want to track by the users. This table will include a column that will contain the application id. This table will contain a foreign key relationship to theapplications
table. This allows you to have some flexibility in having the same category in multiple apps.Finally, create the
users_categories
table which will contain the stats that you want to track, includinguser
,category
and then theaction_date
(the date the event took place).Then to query the data, you would use something similar to this:
And the results would contain (See SQL Fiddle with Demo):
A few other suggestions, you could technically remove the
a_id
from thecategories
table and create a join table between categories and applications, similar to this:This would lead to one additional join in your queries, but the results would be the same: