Sql-server – Hitting a transactional table, or copying the data to a temporary table

sql server

I have the following query (Pseudo SQL warning):

select count(*), groupcolumn from table 
where date is < 6 days ago  and > 7 days ago 
AND date is < 13 days ago 
AND date > 14 days ago... (until 28 days ago) 
group by groupcolumn

I was told to optimize this as:

select count(*), groupcolumn from (
select columns from table
 where date is < 6 days ago  and > 7 days ago 
union all 
select columns from table 
where date is < 13 days ago  and > 14 days ago 
union all 
(repeat until 28 days ago)) group by groupcolumn

This hits the table several times but only brings the rows i need (60K of them for the 4 days specified).

The DBA then told me this query is no good because "it's hitting a transactional table multiple times and it's bad performance". The table in question has records being inserted continuously, a couple to a dozen of them every second.

His suggestion was to change the query to:

select "only the minimum columns i need, if possible only IDs" 
into #temptable
from table where date is < 6 days ago  and > 28 days ago

And then do my original query against "this much smaller table that also doesn't interfere with the transactional table".

The DBA tells me this query is better because "it's not hitting the transactional table so much". It counter intuitive to me, though, since I'm bringing 15K records per day. So instead of copying 60K records to memory, I'll now be copying 420K records to disk, and running a second query against those.

I'm using SET TRANSACTION LEVEL READ UNCOMMITTED and WITH(NOLOCK) for every table. I don't mind dirty reads as I'm just making approximate statistics, so I don't see why "hitting the table so many times" would be an issue here.

Best Answer

It could reduce the time you access the table, and possibly the execution time of your query.

If that is true will depend on the indexes on the table, the size of the table, ....

An example of executing your query on an unindexed heap:

While the rows returned are 15K, the residual I/O on the predicate is high, 120K rows for each table scan.

As you already know, adding an index could change this to a seek predicate.

Reducing the time you would access the table can be done by using a temp table.

Insert the data

Add an index

Do index seeks on your temp table

Conclusion

While reducing the time accessing the table, the execution time in total was twice as long on my test server. To help you confirm if it is a solution for you, we would need the table definition, the actual query and additionaly the test data.

If it is not an option to share these things, then you could do the test yourself, and share the execution plans and the SET STATISTICS IO,TIME output.

Test data used

CREATE TABLE CountingDays(CountingDaysId int identity(1,1), groupcolumn varchar(255), countingdate datetime2)

SET NOCOUNT ON;
DECLARE @i int = 0
WHILE @i < 15000
BEGIN
INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(1,getdate())

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(2,dateadd(day,-4,getdate()))


INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(3,dateadd(day,-7,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(@i,dateadd(day,-8,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(1,getdate())

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(2,dateadd(day,-3,getdate()))


INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(3,dateadd(day,-6,getdate()))

INSERT INTO CountingDays(groupcolumn,countingdate)
VALUES(@i,dateadd(day,-5,getdate()))

SET @i+=1

END




SET STATISTICS IO, TIME ON;
select count(*), groupcolumn from (
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-3,getdate()) and countingdate >= dateadd(day,-4,getdate())
union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-4,getdate()) and countingdate >= dateadd(day,-5,getdate())
 union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-5,getdate()) and countingdate >= dateadd(day,-6,getdate())
union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-6,getdate()) and countingdate >= dateadd(day,-7,getdate())
 union all 
select groupcolumn from CountingDays
 where countingdate < dateadd(day,-7,getdate()) and countingdate >= dateadd(day,-8,getdate())
union all 
select groupcolumn from CountingDays
where countingdate < dateadd(day,-8,getdate()) and countingdate >= dateadd(day,-9,getdate())) a
group by groupcolumn

CREATE TABLE #temp(groupcolumn varchar(255), countingdate datetime2)
INSERT INTO #temp(groupcolumn,countingdate)
select groupcolumn,countingdate FROM CountingDays
CREATE INDEX IX_Countingdate on #temp(Countingdate) include(groupcolumn)

SET STATISTICS IO, TIME ON;
select count(*), groupcolumn from (
select groupcolumn from #temp
 where countingdate < dateadd(day,-3,getdate()) and countingdate >= dateadd(day,-4,getdate())
union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-4,getdate()) and countingdate >= dateadd(day,-5,getdate())
 union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-5,getdate()) and countingdate >= dateadd(day,-6,getdate())
union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-6,getdate()) and countingdate >= dateadd(day,-7,getdate())
 union all 
select groupcolumn from #temp
 where countingdate < dateadd(day,-7,getdate()) and countingdate >= dateadd(day,-8,getdate())
union all 
select groupcolumn from #temp
where countingdate < dateadd(day,-8,getdate()) and countingdate >= dateadd(day,-9,getdate())) a
group by groupcolumn


DROP TABLE #temp

Related Solutions

Sql-server – Select MaxDate where hh:mm:ss makes the difference

In the OPENQUERY, you could get the max timestamp for all the dogs at once, and then join to the original PETS table to get the other columns, like so:

SELECT *
FROM [MyDatabase].[dbo].[DogTags] t1
JOIN OPENQUERY(ROPSA,
    'SELECT [TimeStamp], [Name], [Owner], [Address]
     FROM [CustomerDB].[PETS] t3
     JOIN (SELECT [Name], MAX(TimeStamp) AS TimeStamp
           FROM [CustomerDB].[PETS]
           GROUP BY [Name]) t4
     ON t3.[Name]=t4.[Name] AND t3.TimeStamp=t4.TimeStamp') t2
ON t1.TAGS collate DATABASE_DEFAULT = t2.Name collate DATABASE_DEFAULT

EDIT: Needed to add AS TimeStamp in the subquery.

EDIT: Needed to specify the database everytime in OPENQUERY.

Sql-server – TSQL Delete All dates and stores data

If I understand it right, these queries should work:

DELETE X
From BigTableX X 
Left join TempTableX T on X.PostingDate = T.PostingDate 
    and X.Store = T.PostingStore and X.SKU_id = T.SKU_id
Where T.Sku_id is null
AND EXISTS (Select 1 from TempTableX WHERE postingDate = X.PostingDate) 
AND EXISTS (Select 1 from TempTableX WHERE PostingStore = X.Store) 

DELETE X
From BigTableX X 
CROSS APPLY (Select TOP(1) p = 1 from TempTableX WHERE postingDate = X.PostingDate) P
CROSS APPLY (Select TOP(1) s = 1 from TempTableX WHERE postingStore = X.Store) S
Left join TempTableX T on X.PostingDate = T.PostingDate 
    and X.Store = T.PostingStore and X.SKU_id = T.SKU_id
Where T.Sku_id is null

Either of these 2 queries will delete the same rows.

On my test sample they are more efficient but my test tables don't match your data and you will have to test it. They all do 4 table scan but the new query does not sort data (no DISTINCT).

Can you please provide both execution plans (old and new query) for further analysis.

It may help to add an index on postingDate and on postingStore or on postingDate/postingStore/sku_id. If CLUSTERED INDEX on TempTable (Date/Store/SKU_id) is not narrow enough (3 varchar...), it may be better to remove it.

Best Answer

Related Solutions

Sql-server – Select MaxDate where hh:mm:ss makes the difference

Sql-server – TSQL Delete All dates and stores data

Related Question