Sql-server – the most efficient way to get the minimum of multiple columns on SQL Server 2005

aggregateperformancesql serversql-server-2005

I'm in a situation where I want to get the minimum value from of 6 columns.

I've found three ways so far to accomplish this, but I have concerns with the performance of these methods and would like to know which would be better for performance.

The first method is to use a big case statement. Here's an example with 3 columns, based on the example in the link above. My case statement would be much longer since I will be looking at 6 columns.

Select Id,
       Case When Col1 <= Col2 And Col1 <= Col3 Then Col1
            When Col2 <= Col3 Then Col2 
            Else Col3
            End As TheMin
From   MyTable

The second option is to use the UNION operator with multiple select statements. I would put this in an UDF that accepts an Id parameter.

select Id, dbo.GetMinimumFromMyTable(Id)
from MyTable

and

select min(col)
from
(
    select col1 [col] from MyTable where Id = @id
    union all
    select col2 from MyTable where Id = @id
    union all
    select col3 from MyTable where Id = @id
) as t

And the 3rd option I found was to use the UNPIVOT operator, which I didn't even know existed until just now

with cte (ID, Col1, Col2, Col3)
as
(
    select ID, Col1, Col2, Col3
    from TestTable
)
select cte.ID, Col1, Col2, Col3, TheMin from cte
join
(
    select
        ID, min(Amount) as TheMin
    from 
        cte 
        UNPIVOT (Amount for AmountCol in (Col1, Col2, Col3)) as unpvt
    group by ID
) as minValues
on cte.ID = minValues.ID

Because of the table size, and frequency in which this table is queried and updated, I am concerned about the performance impact these queries would have on the database.

This query will actually be used in a join to a table with a few million records, however the records returned will be reduced to around a hundred records at a time. It will get run many times throughout the day, and the 6 columns I am querying are frequently updated (they contain daily stats). I do not think there are any indexes on the 6 columns I am querying.

Which of these methods is better for performance when trying to get the minimum of multiple columns? Or is there another better method that I don't know of?

I am using SQL Server 2005

Sample Data & Results

If my data contained records like this:

Id    Col1    Col2    Col3    Col4    Col5    Col6
1        3       4       0       2       1       5
2        2       6      10       5       7       9
3        1       1       2       3       4       5
4        9       5       4       6       8       9

The end result should be

Id    Value
1        0
2        2
3        1
4        4

Best Answer

I tested the performance of all 3 methods, and here's what I found:

1 record: No noticeable difference
10 records: No noticeable difference
1,000 records: No noticeable difference
10,000 records: UNION subquery was a little slower. The CASE WHEN query is a little faster than the UNPIVOT one.
100,000 records: UNION subquery is significantly slower, but UNPIVOT query becomes a little faster than the CASE WHEN query
500,000 records: UNION subquery still significantly slower, but UNPIVOT becomes much faster than the CASE WHEN query

So the end results seems to be

With smaller record sets there doesn't seem to be enough of a difference to matter. Use whatever is easiest to read and maintain.
Once you start getting into larger record sets, the UNION ALL subquery begins to perform poorly compared to the other two methods.
The CASE statement performs the best up until a certain point (in my case, around 100k rows), and which point the UNPIVOT query becomes the best-performing query

The actual number at which one query becomes better than another will probably change as a result of your hardware, database schema, data, and current server load, so be sure to test with your own system if you're concerned about performance.

I also ran some tests using Mikael's answer; however, it was slower than all 3 of the other methods tried here for most recordset sizes. The only exception was it did better than a the UNION ALL query for very large recordset sizes. I like the fact it shows the column name in addition to the smallest value though.

I'm not a dba, so I may not have optimized my tests and missed something. I was testing with the actual live data, so that may have affected the results. I tried to account for that by running each query a few different times, but you never know. I would definitely be interested if someone wrote up a clean test of this and shared their results.

Best Answer

Related Solutions

Sql-server – Simple select is taking too long

Mysql – Optimizing large MySQL SELECT WHERE IN clauses

Related Question