Sql-server – Efficient way to handle multiple CASE statements in SELECT

sql serversql-server-2016

I run a report in which I have a situation where based on a column value which is basically a Key or Id, I need to fetch corresponding value from the mapping Id table. Something like below:

    SELECT
     (case when [column1='A'] then (select value from Table1) 
           when [column1='B'] then (select value from Table2) 
           when [column1='C'] then (select value from Table3)

           and so on uptil 35 more 'when' conditions ...

  ELSE column1 end) Value 
from Table1

More precisely:

SELECT 
(case when [A.column1='1'] 
then (select value from B where B.clientId=100 and A.column1=B.Id) 
when [A.column1='2'] 
then (select value from C where C.clientId=100 and A.column1=C.Id) 
when [A.column1='3'] 
then (select value from D where D.clientId=100 and A.column1=D.Id) 
...
and so on uptil 30 more 'when' conditions 
...
ELSE column1 end) 
FROM A

In Tables B,C,D..so on, we maintain data for all clients. Each client has a specific ClientId and these tables B,C,D etc have index in place on Id and ClientId columns.

Is there an efficient way to handle this in SQL Server?

Best Answer

I'm assuming that you have appropriate indexes on the tables in the subqueries. I mocked up some quick test data and put 10 million rows in table A. I wasn't game to create 30 tables so I just created 3 for the CASE expression. I think that 3 is enough to show the general principles.

DROP TABLE IF EXISTS dbo.B;
CREATE TABLE dbo.B (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO B VALUES (100, '1', 'TABLE B'); 

DROP TABLE IF EXISTS dbo.C;
CREATE TABLE dbo.C (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO C VALUES (100, '2', 'TABLE C'); 

DROP TABLE IF EXISTS dbo.D;
CREATE TABLE dbo.D (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO D VALUES (100, '3', 'TABLE D'); 


DROP TABLE IF EXISTS dbo.A;
CREATE TABLE dbo.A (
    column1 VARCHAR(20) NOT NULL
);

INSERT INTO dbo.A WITH (TABLOCK)
SELECT CAST(1 + t.RN % 3 AS VARCHAR(20))
FROM
(
    SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t;

I disabled result sets and ran the following query in SSMS:

SELECT A.column1
FROM A;

It took around 0.723 seconds. I'm doing pretty unscientific tests because I don't know anything your data. In any case, with serial queries we can't expect a better result than 0.7 seconds. That's our baseline.

The most efficient way to write this query is without joins at all. The key is that the CASE expression is only ever going to return 3 (or 30) unique values if it finds a match. You can save off the results into local variables and just use those in the query. The query below finishes in around 1.044 seconds:

DECLARE @B_VALUE VARCHAR(100) = (select value from B where B.clientId=100 and B.Id = '1');
DECLARE @C_VALUE VARCHAR(100) = (select value from C where C.clientId=100 and C.Id = '2');
DECLARE @D_VALUE VARCHAR(100) = (select value from D where D.clientId=100 and D.Id = '3');

SELECT 
(case when A.column1='1' then @B_VALUE
      when A.column1='2' then @C_VALUE
      when A.column1='3' then @D_VALUE
      -- omitted other columns
else column1 end) 
FROM A;

The plan is very simple:

enter image description here

For another option you can write the query with joins (where we can rewrite the CASE expression in a more compact form, using COALESCE(). This finished in around 2.314 seconds:

SELECT 
    COALESCE(B.column1, C.column1, D.column1, -- omitted other columns  
             A.column1)
    -- (case A.column1
    --    when '1' then B.value
    --    when '2' then C.value
    --    when '3' then D.value
    --    -- omitted other columns
    -- else A.column1 end) 
FROM A
LEFT JOIN B ON B.clientId=100 and B.Id = '1'
LEFT JOIN C ON C.clientId=100 and C.Id = '2'
LEFT JOIN D ON D.clientId=100 and D.Id = '3';

Here's the plan:

enter image description here

You can get a nearly identical runtime and query plan by writing the query like this:

SELECT 
(case A.column1
   when '1' then (select value from B where B.clientId=100 and '1'=B.Id) 
   when '2' then (select value from C where C.clientId=100 and '2'=C.Id) 
   when '3' then (select value from D where D.clientId=100 and '3'=D.Id) 
-- omitted other columns
else column1 end) 
FROM A;

The original query in the question has an issue: SQL Server is doing a useless sort before the nested loop join. That query finishes in around 5.838 seconds on my machine.

enter image description here

Trace flag 8690 eliminates the sort as well as the table spools. The query runs in about 7.479 seconds with trace flag 8690, so I suppose that the spools are helpful for this query.