Sql-server – How to structure Master/Detail tables for performance without denormalizing

partitioningsql server

Some Background Info

We have a set of tables that holds all the transactions for our system, TransactionHeaders and TransactionDetails. There are roughly 80k Transactions a day which translates to 80k rows in TransactionHeaders and 900k rows in TransactionDetails being burned daily. At current, the TransactionHeaders has about 10 million rows and TransactionDetails has about 110 million rows. We use the data for general reporting.

dbo.TransactionHeaders (
TransactionHeaderId            int                            identity,
TransactionHeaderTypeId        int                            not null,
StoreId                        int                            not null,
TransactionDate                date                           not null,
TransactionTime                time                           not null,
...)  

dbo.TransactionDetails (
TransactionDetailId            int                            identity,
TransactionHeaderId            int                            not null,
ItemUPC                        NCHAR(14)                      not null,
Price                          NUMERIC(16,2)                  not null,
ReplicationDate                datetime                       not null
...)  

The Issue

Querying has become cumbersome. It takes a very long time to access the sales of a given store or a given item for any period.

What I've Tried

I have tried to bring TransactionDate down to the TransactionDetails table in order to partition it on the date with one partition per day. This worked great for finding the sales of an item. The problem is that many of the reports require the StoreID in addition to being over a specific date range.

Given that adding more information from the TransactionHeader table to the TransactionDetail table breaks the pattern, I'm hesitant to denormalize the tables into one table due to storage concerns.

I've had the idea to partition TransactionHeader on TransactionDate and partition TransactionDetail on TransactionHeaderID. In theory this makes the queried data significantly smaller and reinforces the pattern by making the details only reasonably accessed via the header information.

The Question(TL;DR)

Is there a preferred, correct, standard, etc. method for dealing with tables in the Master/Detail pattern in order to increase performance? Partitioning one or both tables? I'd like to avoid denormalizing if at all possible.

Best Answer

You've got a few different questions in here:

Q: It takes a very long time to access the sales of a given store or a given item for any period.

A: To troubleshoot that, we would need to see the execution plans of the queries involved, plus know a little about the query runtime and the hardware involved. 10mm rows in a header table and 110mm rows in a detail table isn't much at all for SQL Server, so this should be a solvable problem.

Q: (Partitioning) worked great for finding the sales of an item. The problem is that many of the reports require the StoreID in addition to being over a specific date range.

A: Correct, partitioning rarely makes SELECT queries faster. It's more about improving performance of bulk loads, specifically partition switching. I wouldn't think of partitioning as a solution to this problem, and indeed, it will actually make most queries worse.

Q: Is there a preferred, correct, standard, etc. method for dealing with tables in the Master/Detail pattern in order to increase performance?

A: Absolutely - archive older data. Figure out what you're going to let users query online at high speed, and then beyond that, move the data into a separate set of archive tables. You can use a partitioned view over the old and new tables in order to give them a single seamless view into the data for easier reporting too.

There's a lot of advantages to this approach. For example, when you want to add additional fields to the current table, you can do that quickly without having to deal with a large amount of archive data. If you want to add lots of indexes to the old archive data, you can - because it's not getting tons of inserts/updates/deletes anymore. If you split the old and new data into different databases, you can even use different backup/recovery strategies with them - even while the view is in place, and users don't know the data is split.