Sometimes I wonder if SHORT scripts really is the best thing to focus on.
The size of a script has little to do with how efficiently the query will execute. A more compact statement will likely consume fewer resources in terms of compilation, but (re)compilation is usually a rare occurrence in a live system.
Fewer table accesses is usually desirable, though, and this does lead to more compact code.
Very generally speaking, a smaller execution plan will yield better results, and a lower estimated cost will yield better results. Again, though, it's highly situational. Cost estimates in particular can be way off in some cases. It's important to measure the actual execution time, because at the end of the day, that's what matters.
With left joins i can achieve what i want with just a few lines. But then I tried with a longer script, using unions. Which is the best method?
First of all, we need to know how much data will be in these tables in a real system. Right now there's so little it will be difficult to use the STATISTICS TIME
performance metrics to figure out a winner -- the results that come back will be dominated by factors other than the query execution. With more data, it's likely the plans will change, thus rendering the comparison here moot.
Having said that, by looking at the query plans as they are now from a logical point of view, the first one is the winner.
You can see that the Clustered Index Scan of quantities
appears once in the first plan, while it appears four times in the second one. The second plan also contains an expensive Distinct Sort as a result of using UNION
s (this operator could be eliminated by using UNION ALL
s instead, which won't change the results).
The first query could also probably be improved, by getting index seeks on the colors
and sizes
tables, instead of table scans. It might be worth trying a hash match plan as well (which is what you'll probably see when quantities
and products
are larger), but for tables this small, the startup cost may be too much overhead to be of benefit.
What I would suggest you do is run each of the statements you want to test 10,000+ times in a loop, figure out the average execution time, and then compare.
Short of restarting the SQL Server service or the box this is enough. If you want to be absolutely fresh you might want to drop user created statistics from your last run.
SELECT stats.name, objects.name, schemas.name,objects.create_date
FROM sys.stats
JOIN sys.objects ON stats.OBJECT_ID = objects.OBJECT_ID
JOIN sys.schemas ON objects.schema_id = schemas.schema_id
WHERE stats.stats_id > 0
AND stats.stats_id < 255
AND objects.is_ms_shipped = 0
AND auto_created = 1
order by objects.create_date desc
Best Answer
A logical read is counted when a single page is retrieved from buffer cache during query execution. This is counted regardless of whether a physical or read-ahead was used to cache the page, or if the page already existed in the buffer cache. Consequentially, logical reads is a measure of how many times pages were actually touched in memory during query execution.
A physical read is counted when the storage engine reads one or more pages from storage because the requested page is not already in cache. This excludes physical IO done by read-ahead scans.
A read-ahead read is counted when the storage engine reads one or more full extents (each extent being 8 contiguous pages) from storage during sequential scans. Read-ahead reads prefetches data into memory aggressively so that the pages are likely to be already cached when needed by the query. The number of extents read by read-ahead scans varies by SQL Server version and edition as well as fragmentation. I've seen read-ahead scans read as much as 4MB at once using a single scatter-gather IO during scans in Enterprise Edition using the latest SQL Server version.
Note that the SQL Server storage engine behaves differently with a cold cache (e.g. after restart or
DBCC DROPCLEANBUFFERS
). When the cache is not yet warmed up (Buffer Manager's Target Pages met), the storage engine performs full 64K extent reads instead of single 8K page reads to read pages into cache. This warms the cache much more quickly than would otherwise occur.I prefer to focus on logical reads rather than physical for the purpose of query and index tuning because that's a better measure of the work performed by the query. It is only by happenstance whether data is cached or not. Testing with a cold cache is more of a measure of storage performance than query performance, IMHO, and isn't representative of what will occur under a production workload with a warm cache.