Sql-server – A performance Issue when querying a big table using a PK/clustered index

materialized-viewperformancequery-performancesql servert-sql

To simplify the situation, I will only consider one large table…
On a nightly basis a store will send all the new and changed data for one day for a large table to the head office. (this portion is good)
Also, the store sends a summary of that table for the last 30 days to the head office for that large table.

At the head office,
The new and changed data is updated in the large table (no problem here).
The summary of the last 30 that is received and is uploaded into a table. It is then compared to a query that summarizes the data at the head office of this very large table (that contains all stores) for that same store.<– this is where the problem is. This is done to be able to make sure that the store's data matches the head office data for that store (we get a warning if it doesn't match for which they need to take action)

The problem is the the summary query takes too much time… I'm looking to change the way we compare the store table with the host table in a more efficient way.

I tried the indexed view and the results were great but the fact that they have too many limitations, it makes it practically impossible to implement it on a large scale (to all software owners, cash registers, Stores and Head offices) due to different structures and different versions of our software.

I've been trying to think of different ways I can insure that the data of a table (for at least the last 30 days) for a store matches the head office but I feel like I'm turning in circles… So I'm looking for ideas to help me look at this differently.

Limitations: We use SQL Express at the stores, and usually standard at head offices. There's no direct connection between both databases (the data is transferred through files).

Any help is appreciated.
thank you


Added more info:
Structure of the table (I know is not ideal, it's what I inherited):
Date, Store, terminal, transNum, lineNum, Qty, Amount + 194 MORE COLUMNS.
The PK and clustered index is: Date, Store, terminal, transNum, lineNum

The query to summarize is simple:

Select Date, Store, sum(Qty) as Qty, sum(Amount) as Amt
from MyHugeTable 
where date between '2017-07-22' and '2017-08-22'
and store = '1234'
group by Date, Store;

Best Answer

If speeding up this query is critically important, I'd consider creating a covering index:

CREATE INDEX IX_MyHugeTable_Date_Store_Qty_Amt
    ON MyHugeTable (Date, Store)
       INCLUDE (Qty, Amt)
;

Adding a new index will impact the time to add, update, and delete rows from the table. You should test to determine the impact of adding this index to the nightly update process. It won't help if this speeds up the query generating the summary data for each day, but the slowdown in inserting and updating daily changes is slowed down more than the summary query is sped up. This would hold true with some of the other suggestions from the comments; test to make sure that the changes you make do not harm normal operations.

FYI: the idea behind a covering index is simple - if an index has all the columns a query refers to, then the engine can retrieve that information from the index without actually touching the table itself. This index should take up much less space per row than the table (with around 200 columns), so the query should perform much better.

As noted by David Browne, your index would include not just the four columns listed, but the other 3 keys that make up the primary key. That's because all non-clustered indexes on a table with a clustered index use the cluster key to identify the row location in the main table. See this link for full details. Still, the index will be much narrower than your ~200 column table.