I have the following query (Pseudo SQL warning):
select count(*), groupcolumn from table
where date is < 6 days ago and > 7 days ago
AND date is < 13 days ago
AND date > 14 days ago... (until 28 days ago)
group by groupcolumn
I was told to optimize this as:
select count(*), groupcolumn from (
select columns from table
where date is < 6 days ago and > 7 days ago
union all
select columns from table
where date is < 13 days ago and > 14 days ago
union all
(repeat until 28 days ago)) group by groupcolumn
This hits the table several times but only brings the rows i need (60K of them for the 4 days specified).
The DBA then told me this query is no good because "it's hitting a transactional table multiple times and it's bad performance". The table in question has records being inserted continuously, a couple to a dozen of them every second.
His suggestion was to change the query to:
select "only the minimum columns i need, if possible only IDs"
into #temptable
from table where date is < 6 days ago and > 28 days ago
And then do my original query against "this much smaller table that also doesn't interfere with the transactional table".
The DBA tells me this query is better because "it's not hitting the transactional table so much". It counter intuitive to me, though, since I'm bringing 15K records per day. So instead of copying 60K records to memory, I'll now be copying 420K records to disk, and running a second query against those.
I'm using SET TRANSACTION LEVEL READ UNCOMMITTED and WITH(NOLOCK) for every table. I don't mind dirty reads as I'm just making approximate statistics, so I don't see why "hitting the table so many times" would be an issue here.
Best Answer
It could reduce the time you access the table, and possibly the execution time of your query.
If that is true will depend on the indexes on the table, the size of the table, ....
An example of executing your query on an unindexed heap:
While the rows returned are 15K, the residual I/O on the predicate is high, 120K rows for each table scan.
As you already know, adding an index could change this to a seek predicate.
Reducing the time you would access the table can be done by using a temp table.
Insert the data
Add an index
Do index seeks on your temp table
Conclusion
While reducing the time accessing the table, the execution time in total was twice as long on my test server. To help you confirm if it is a solution for you, we would need the table definition, the actual query and additionaly the test data.
If it is not an option to share these things, then you could do the test yourself, and share the execution plans and the
SET STATISTICS IO,TIME
output.Test data used