Sql-server – How to optimize dashboard performance

database-designperformance-dashboardsql server

I have a table like

FieldA, FieldB, FieldC, FieldD, TheDate, Count

I have a web app which provides a dashboard of the top few Count by each of the fields. Not wanting to prematurely optimize, the original, brute force queries for these charts are like:

SELECT TOP 10
    FieldA,
    SUM(Counts) AS Counts
FROM TheTable
WHERE @StartDate <= TheDate AND TheDate <= @EndDate
GROUP BY FieldA
ORDER BY Counts

And the same for the other fields. But the server ends up selecting by date range independently for each chart and when there is a lot of data, the system is bogging down.

It seems wrong to get all the data in the app (once) then do the summary by the columns locally. And maybe the RDBMS caches a lot of the records so the second through fourth chart are more efficient that the first.

I'm using Azure SQL and neither SQL Server Management Studio nor DataGrip suggest any missing indices which could help.

Any thoughts on techniques to do multiple, similar options on the same data? In general or for Azure SQL. Thanks.

Best Answer

suggest any missing indices which could help

You need to show what indexes you currently, and the size & shape of your data (approx how many rows historically?, and how many new are added each day/hour/other?, how much do fields vary?, how often are they NULL?), have defined for us to give absolutely relevant advice.

Any thoughts on

As most of your queries follow the stated pattern, you are always performing a query on a range of dates so at very least you need an index on [date]. Further more it would probably be a good idea for this to be your clustered index[†] to reduce the page accesses needed for such range queries.

If you can afford the extra space (and the extra RAM needed for your common working set to stay in there to avoid IO thrashing) then you may get a noticeable boost from having indexes on [date],FieldA, [date],FieldB, ... so that the grouping does not have to perform a sort operation (once the data is found by date it is already in order in the index used). If there are particular fields that are queried for much more often than the others, then perhaps just do this to help the queries on those fields instead of spending the resources doing it for all of them.

[†] even if you have a unique integer as your primary key[‡] (or something else like a UUID)
[‡] and you should have a surrogate key in this example otherwise you could have rows that are otherwise identical which doesn't fit the relational model and could cause issues

As a side-note: date is a keyword, as it is a type, so I would avoid using that as a column name even in examples.