Sql-server – Which of these queries is best for performance

performancequery-performancesql server

Which of these queries are best for performance ? Sometimes I wonder if SHORT scripts really is the best thing to focus on. These scripts performs the same task. With left joins i can achieve what i want with just a few lines. But then I tried with a longer script, using unions. Which is the best method? Between this:

SELECT p.productID, p.product, C.color, C.colorID, S.size, S.sizeID, q.qty From quantities q
        INNER join products P ON p.productID = q.productID
        LEFT JOIN colors C ON C.colorID = q.colorID
        LEFT JOIN sizes S ON S.sizeID = q.sizeID
        --WHERE q.productID = @productID 

and this:

SELECT p.productID, p.product, C.color, C.colorID, S.size, S.sizeID, q.qty From quantities q
        inner join products P ON p.productID = q.productID
        INNER JOIN colors C ON C.colorID = q.colorID
        INNER JOIN sizes S ON S.sizeID = q.sizeID
        --WHERE q.productID = @productID 
    UNION 
    SELECT p.productID, p.product, NULL, NULL, S.size, S.sizeID, q.qty From quantities q
        inner join products P ON p.productID = q.productID
        INNER JOIN sizes S ON S.sizeID = q.sizeID
    WHERE /* q.productID = @productID AND */ q.sizeID IS NOT NULL AND q.colorID IS NULL
    UNION
    SELECT p.productID, p.product, C.color, C.colorID, NULL, NULL, q.qty From quantities q
        inner join products P ON p.productID = q.productID
        INNER JOIN colors C ON C.colorID = q.colorID
    WHERE /* q.productID = @productID AND */ q.colorID IS NOT NULL AND q.sizeID IS NULL
    UNION
    SELECT p.productID, p.product, NULL, NULL, NULL, NULL, q.qty From quantities q
        inner join products P ON p.productID = q.productID
    WHERE /* q.productID = @productID AND */ q.colorID IS NULL AND q.sizeID IS NULL

EDIT:

SQL Server parse and compile time: CPU time = 32 ms, elapsed time
= 65 ms.

(10 row(s) affected) Table 'sizes'. Scan count 1, logical reads 21,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0. Table 'colors'. Scan count
1, logical reads 21, physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0. Table
'products'. Scan count 0, logical reads 20, physical reads 1,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0. Table 'quantities'. Scan count 1, logical reads 2,
physical reads 1, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 12
ms.

(10 row(s) affected) Table 'products'. Scan count 0, logical reads 20,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0. Table 'quantities'. Scan
count 4, logical reads 8, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table
'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0. Table 'sizes'. Scan count 0, logical reads 18,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0. Table 'colors'. Scan count
0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

It seems to me that it's the second query that is best for performance, the bigger query with unions.

Do anyone of you got a clue why this happends or do I have to provide with more information (table info and such things) ?

Execution plan:

With left joins
With unions

Thanks!

Best Answer

Sometimes I wonder if SHORT scripts really is the best thing to focus on.

The size of a script has little to do with how efficiently the query will execute. A more compact statement will likely consume fewer resources in terms of compilation, but (re)compilation is usually a rare occurrence in a live system.

Fewer table accesses is usually desirable, though, and this does lead to more compact code.

Very generally speaking, a smaller execution plan will yield better results, and a lower estimated cost will yield better results. Again, though, it's highly situational. Cost estimates in particular can be way off in some cases. It's important to measure the actual execution time, because at the end of the day, that's what matters.

With left joins i can achieve what i want with just a few lines. But then I tried with a longer script, using unions. Which is the best method?

First of all, we need to know how much data will be in these tables in a real system. Right now there's so little it will be difficult to use the STATISTICS TIME performance metrics to figure out a winner -- the results that come back will be dominated by factors other than the query execution. With more data, it's likely the plans will change, thus rendering the comparison here moot.

Having said that, by looking at the query plans as they are now from a logical point of view, the first one is the winner.

You can see that the Clustered Index Scan of quantities appears once in the first plan, while it appears four times in the second one. The second plan also contains an expensive Distinct Sort as a result of using UNIONs (this operator could be eliminated by using UNION ALLs instead, which won't change the results).

The first query could also probably be improved, by getting index seeks on the colors and sizes tables, instead of table scans. It might be worth trying a hash match plan as well (which is what you'll probably see when quantities and products are larger), but for tables this small, the startup cost may be too much overhead to be of benefit.

What I would suggest you do is run each of the statements you want to test 10,000+ times in a loop, figure out the average execution time, and then compare.