SQL Server – How to Get Different Metrics for Same Table with One Query

sql server

I have a lot of queries to the same table. All queries looking like this

SELECT COUNT(*) FROM <sameTable> WHERE <whereClause> GROUP BY <groupBy>

Let's the amount of such queries is 40. So the table is iterates 40 times, I'm trying to reduce amount of iterations. I have tried different approaches and finished with this:

SELECT
        SUM(CASE WHEN ((p.statusId = 9)) THEN 1 ELSE 0 END) as metric1
        ,SUM(CASE WHEN ((p.statusId IN (10, 1088))) THEN 1 ELSE 0 END) as metric2
        ,SUM(CASE WHEN ((p.statusId = 11)) THEN 1 ELSE 0 END) as metric3
        ,SUM(CASE WHEN ((p.statusId = 20)) THEN 1 ELSE 0 END) as metric4
        ,(SELECT TOP 1 COUNT(DISTINCT p.CaseId) FROM vw_DashboardWorkbench p
LEFT JOIN vw_DashboardCaseStatusHistory history on p.CaseId = history.CaseId
LEFT JOIN vw_EzProviderUser provider on p.SecondaryPhysicianAdvisorId = provider.ProviderId
WHERE p.statusId = 12
GROUP BY history.AssignedByUser ORDER BY COUNT(DISTINCT p.CaseId) DESC) as metric5
FROM
    vw_DashboardWorkbench p
LEFT JOIN vw_DashboardCaseStatusHistory history on p.CaseId = history.CaseId
LEFT JOIN vw_EzProviderUser provider on p.SecondaryPhysicianAdvisorId = provider.ProviderId

There are two issues:

Summ works on all records, but some metrics need unique count of ids matching the expression.
The metric5 is subquery, I have to use it because I have failed to use aggregation function to get max value of cases in status 12 by user.

Best Answer

Another way of writing the query (without using the vw_EzProviderUser provider table because it is unused in the example would be:

;WITH CTE AS
(
SELECT COUNT(DWB.CaseId) as CountingCaseId, DWB.CaseId,DWB.statusId
FROM
dbo.DashboardWorkbench DWB
GROUP BY DWB.CaseId,DWB.statusId
),
CTE2 AS
(
SELECT  COUNT(DISTINCT DWB.CaseId) as CountingCaseIdDistinct, 
        DWB.statusId
FROM dbo.DashboardWorkbench DWB
LEFT JOIN dbo.DashboardCaseStatusHistory DCSH on DWB.CaseId = DCSH.CaseId
GROUP BY DCSH.AssignedByUser,DWB.statusId
)
SELECT  SUM(CASE WHEN CTE.statusId = 9 THEN CountingCaseId ELSE 0 END) as metric1,
        SUM(CASE WHEN CTE.statusId IN (10, 1088) THEN CountingCaseId ELSE 0 END) as metric2,
        SUM(CASE WHEN CTE.statusId = 11 THEN CountingCaseId ELSE 0 END) as metric3,
        SUM(CASE WHEN CTE.statusId = 20 THEN CountingCaseId ELSE 0 END) as metric4,
        (SELECT MAX(CountingCaseIdDistinct) FROM CTE2
        WHERE CTE2.statusId = 12) as metric5,
        (SELECT MAX(CountingCaseIdDistinct) FROM CTE2
        WHERE CTE2.statusId = 10) as metric6
FROM CTE;

Performance

As of performance this could help depending on your data due to the early grouping on DWB.CaseId,DWB.statusId.

YMMV

There should be better solutions as to make it more performant such as columnstore indexes / better rewrites / indexing / ....

Temp tables

I would advise storing the result of CTE2 (or your subquery) in a temporary table if you are calling it multiple times.

This way, you are only reading from the resultset, not evaluating the same query each time.

(Metric5 and Metric6 in the example).

Testing

This was tested on SQL Server 2017

When adding some indexes to cover all options:

CREATE INDEX IX_statusId_CaseId
ON dbo.DashboardWorkbench(statusId,CaseId);
CREATE INDEX IX_CaseId_statusId
ON dbo.DashboardWorkbench(CaseId,statusId);

CREATE INDEX IX_AssignedByUser_CaseId
ON dbo.DashboardCaseStatusHistory(AssignedByUser,CaseId);
CREATE INDEX IX_CaseId_AssignedByUser
ON dbo.DashboardCaseStatusHistory(CaseId,AssignedByUser);

The only real 'benefit' compared to your plan is not having the compute scalar operator or the multiple SUM(CASE WHEN StatusId = 9 THEN 1 ELSE 0), statusid = 10, ... in your query plan on all data.

In my rewrite, it is only doing that on the grouped count of DWB.CaseId & DWB.statusId.

Part of the plan of your query on my test YMMV

The compute scalar from the part above (second from the right)

Wheras there is earlier grouping in my plan + count(DWB.CaseId)

This part of the plan represents the T-SQL statements in CTE (1)

SELECT COUNT(DWB.CaseId) as CountingCaseId, DWB.CaseId,DWB.statusId
FROM
dbo.DashboardWorkbench DWB
GROUP BY DWB.CaseId,DWB.statusId

the computes are done afterwards, on the result of the above query:

The rest of the plan is virtually the same.

Extra info

There will definetely be better solutions, but somebody might be able to use the DB<>Fiddle below and test for something better.

DB<>Fiddle with test data, execution plans etc.

Temp tables can help if CTE2 or your subquery is evaluated multiple times, indexes matter. And as mentioned in the comments there are other options to explore, but we would need more information.

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...

Your test is flawed.

What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.

And we can see that the buffer pool was not totally blown away by the detach/attach. Seems like my buddy was wrong. Does anyone disagree or have a better argument?

Yes. You're interpreting physical reads 0 as meaning there were not any physical reads

Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads 0, read-ahead reads 768, lob logical reads 94, lob physical reads 4, lob read-ahead reads 24.

As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.

When SQL Server performs a sequential scan of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor. The read ahead mechanism tries to stay 500 pages ahead of the scan.

None of the pages required to satisfy your query were in memory until read-ahead put them there.

As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.

Sql-server – SQL join query to show rows with non-existent rows in one table

Thank you for SQLfiddle and sample data! I wish more questions started this way.

If you want all members regardless of whether they have an entry for that date, you want a LEFT OUTER JOIN. You were very close with this version however a little trick with outer joins is that if you add a filter to the outer table in the WHERE clause, you turn an outer join to an inner join, because it will exclude any rows that are NULL on that side (because it doesn't know if NULL would match the filter or not).

I modified the first query to get a row for every member:

SELECT Members.Member_ID
      ,Time_Entry.Date_Start
      ,Time_Entry.Hours_Actual
      ,Time_Entry.Hours_Bill
FROM dbo.Members
  LEFT OUTER JOIN dbo.Time_Entry
--^^^^ changed from FULL to LEFT
  ON Members.Member_ID = Time_Entry.Member_ID
  AND Time_Entry.Date_Start = '20131110';
--^^^ changed from WHERE to AND

I'll leave it as an exercise for the reader to take it from there and add the other columns, formatting, COALESCE etc.

Some other notes:

please always use the schema prefix when creating and referencing objects
please always use a length when converting to varchar etc.
stay away from ambiguous, regional date formats like mm-dd-yyyy

consider using aliases to make your queries easier to read. E.g. the above could be re-written as:

SELECT m.Member_ID
  ,t.Date_Start
  ,t.Hours_Actual
  ,t.Hours_Bill
FROM dbo.Members AS m
LEFT OUTER JOIN dbo.Time_Entry AS t
ON m.Member_ID = t.Member_ID
AND t.Date_Start = '20131110';

... a lot tidier, IMHO, as long as you use sensible aliases.

Best Answer

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

Sql-server – SQL join query to show rows with non-existent rows in one table

Related Question