Sql-server – Hitting a transactional table, or copying the data to a temporary table

sql server

I have the following query (Pseudo SQL warning):

select count(*), groupcolumn from table 
where date is < 6 days ago  and > 7 days ago 
AND date is < 13 days ago 
AND date > 14 days ago... (until 28 days ago) 
group by groupcolumn

I was told to optimize this as:

select count(*), groupcolumn from (
select columns from table
 where date is < 6 days ago  and > 7 days ago 
union all 
select columns from table 
where date is < 13 days ago  and > 14 days ago 
union all 
(repeat until 28 days ago)) group by groupcolumn

This hits the table several times but only brings the rows i need (60K of them for the 4 days specified).

The DBA then told me this query is no good because "it's hitting a transactional table multiple times and it's bad performance". The table in question has records being inserted continuously, a couple to a dozen of them every second.

His suggestion was to change the query to:

select "only the minimum columns i need, if possible only IDs" 
into #temptable
from table where date is < 6 days ago  and > 28 days ago 

And then do my original query against "this much smaller table that also doesn't interfere with the transactional table".

The DBA tells me this query is better because "it's not hitting the transactional table so much". It counter intuitive to me, though, since I'm bringing 15K records per day. So instead of copying 60K records to memory, I'll now be copying 420K records to disk, and running a second query against those.

I'm using SET TRANSACTION LEVEL READ UNCOMMITTED and WITH(NOLOCK) for every table. I don't mind dirty reads as I'm just making approximate statistics, so I don't see why "hitting the table so many times" would be an issue here.

Best Answer

It could reduce the time you access the table, and possibly the execution time of your query.

If that is true will depend on the indexes on the table, the size of the table, ....

An example of executing your query on an unindexed heap:

enter image description here

While the rows returned are 15K, the residual I/O on the predicate is high, 120K rows for each table scan.

enter image description here

As you already know, adding an index could change this to a seek predicate.

Reducing the time you would access the table can be done by using a temp table.

Insert the data

enter image description here

Add an index

enter image description here

Do index seeks on your temp table

enter image description here

Conclusion

While reducing the time accessing the table, the execution time in total was twice as long on my test server. To help you confirm if it is a solution for you, we would need the table definition, the actual query and additionaly the test data.

If it is not an option to share these things, then you could do the test yourself, and share the execution plans and the SET STATISTICS IO,TIME output.


Test data used

CREATE TABLE CountingDays(CountingDaysId int identity(1,1), groupcolumn varchar(255), countingdate datetime2)

SET NOCOUNT ON;
DECLARE @i int = 0
WHILE @i < 15000
BEGIN
INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(1,getdate())

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(2,dateadd(day,-4,getdate()))


INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(3,dateadd(day,-7,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(@i,dateadd(day,-8,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(1,getdate())

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(2,dateadd(day,-3,getdate()))


INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(3,dateadd(day,-6,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(@i,dateadd(day,-5,getdate()))

SET @i+=1

END




SET STATISTICS IO, TIME ON;
select count(*), groupcolumn from (
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-3,getdate()) and countingdate >= dateadd(day,-4,getdate())
union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-4,getdate()) and countingdate >= dateadd(day,-5,getdate())
 union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-5,getdate()) and countingdate >= dateadd(day,-6,getdate())
union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-6,getdate()) and countingdate >= dateadd(day,-7,getdate())
 union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-7,getdate()) and countingdate >= dateadd(day,-8,getdate())
union all 
select groupcolumn from CountingDays
where countingdate < dateadd(day,-8,getdate()) and countingdate >= dateadd(day,-9,getdate())) a
group by groupcolumn

CREATE TABLE #temp(groupcolumn varchar(255), countingdate datetime2)
INSERT INTO #temp(groupcolumn,countingdate)
select groupcolumn,countingdate FROM CountingDays
CREATE INDEX IX_Countingdate on #temp(Countingdate) include(groupcolumn)

SET STATISTICS IO, TIME ON;
select count(*), groupcolumn from (
select groupcolumn from #temp
 where countingdate < dateadd(day,-3,getdate()) and countingdate >= dateadd(day,-4,getdate())
union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-4,getdate()) and countingdate >= dateadd(day,-5,getdate())
 union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-5,getdate()) and countingdate >= dateadd(day,-6,getdate())
union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-6,getdate()) and countingdate >= dateadd(day,-7,getdate())
 union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-7,getdate()) and countingdate >= dateadd(day,-8,getdate())
union all 
select groupcolumn from #temp
where countingdate < dateadd(day,-8,getdate()) and countingdate >= dateadd(day,-9,getdate())) a
group by groupcolumn


DROP TABLE #temp