Sql-server – Efficient way to handle multiple CASE statements in SELECT

sql serversql-server-2016

I run a report in which I have a situation where based on a column value which is basically a Key or Id, I need to fetch corresponding value from the mapping Id table. Something like below:

    SELECT
     (case when [column1='A'] then (select value from Table1) 
           when [column1='B'] then (select value from Table2) 
           when [column1='C'] then (select value from Table3)

           and so on uptil 35 more 'when' conditions ...

  ELSE column1 end) Value 
from Table1

More precisely:

SELECT 
(case when [A.column1='1'] 
then (select value from B where B.clientId=100 and A.column1=B.Id) 
when [A.column1='2'] 
then (select value from C where C.clientId=100 and A.column1=C.Id) 
when [A.column1='3'] 
then (select value from D where D.clientId=100 and A.column1=D.Id) 
...
and so on uptil 30 more 'when' conditions 
...
ELSE column1 end) 
FROM A

In Tables B,C,D..so on, we maintain data for all clients. Each client has a specific ClientId and these tables B,C,D etc have index in place on Id and ClientId columns.

Is there an efficient way to handle this in SQL Server?

Best Answer

I'm assuming that you have appropriate indexes on the tables in the subqueries. I mocked up some quick test data and put 10 million rows in table A. I wasn't game to create 30 tables so I just created 3 for the CASE expression. I think that 3 is enough to show the general principles.

DROP TABLE IF EXISTS dbo.B;
CREATE TABLE dbo.B (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO B VALUES (100, '1', 'TABLE B'); 

DROP TABLE IF EXISTS dbo.C;
CREATE TABLE dbo.C (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO C VALUES (100, '2', 'TABLE C'); 

DROP TABLE IF EXISTS dbo.D;
CREATE TABLE dbo.D (
    ClientID INT NOT NULL,
    Id VARCHAR(20) NOT NULL,
    [Value] VARCHAR(100),
    PRIMARY KEY (ClientID, Id)
);

INSERT INTO D VALUES (100, '3', 'TABLE D'); 


DROP TABLE IF EXISTS dbo.A;
CREATE TABLE dbo.A (
    column1 VARCHAR(20) NOT NULL
);

INSERT INTO dbo.A WITH (TABLOCK)
SELECT CAST(1 + t.RN % 3 AS VARCHAR(20))
FROM
(
    SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t;

I disabled result sets and ran the following query in SSMS:

SELECT A.column1
FROM A;

It took around 0.723 seconds. I'm doing pretty unscientific tests because I don't know anything your data. In any case, with serial queries we can't expect a better result than 0.7 seconds. That's our baseline.

The most efficient way to write this query is without joins at all. The key is that the CASE expression is only ever going to return 3 (or 30) unique values if it finds a match. You can save off the results into local variables and just use those in the query. The query below finishes in around 1.044 seconds:

DECLARE @B_VALUE VARCHAR(100) = (select value from B where B.clientId=100 and B.Id = '1');
DECLARE @C_VALUE VARCHAR(100) = (select value from C where C.clientId=100 and C.Id = '2');
DECLARE @D_VALUE VARCHAR(100) = (select value from D where D.clientId=100 and D.Id = '3');

SELECT 
(case when A.column1='1' then @B_VALUE
      when A.column1='2' then @C_VALUE
      when A.column1='3' then @D_VALUE
      -- omitted other columns
else column1 end) 
FROM A;

The plan is very simple:

For another option you can write the query with joins (where we can rewrite the CASE expression in a more compact form, using COALESCE(). This finished in around 2.314 seconds:

SELECT 
    COALESCE(B.column1, C.column1, D.column1, -- omitted other columns  
             A.column1)
    -- (case A.column1
    --    when '1' then B.value
    --    when '2' then C.value
    --    when '3' then D.value
    --    -- omitted other columns
    -- else A.column1 end) 
FROM A
LEFT JOIN B ON B.clientId=100 and B.Id = '1'
LEFT JOIN C ON C.clientId=100 and C.Id = '2'
LEFT JOIN D ON D.clientId=100 and D.Id = '3';

Here's the plan:

You can get a nearly identical runtime and query plan by writing the query like this:

SELECT 
(case A.column1
   when '1' then (select value from B where B.clientId=100 and '1'=B.Id) 
   when '2' then (select value from C where C.clientId=100 and '2'=C.Id) 
   when '3' then (select value from D where D.clientId=100 and '3'=D.Id) 
-- omitted other columns
else column1 end) 
FROM A;

The original query in the question has an issue: SQL Server is doing a useless sort before the nested loop join. That query finishes in around 5.838 seconds on my machine.

Trace flag 8690 eliminates the sort as well as the table spools. The query runs in about 7.479 seconds with trace flag 8690, so I suppose that the spools are helpful for this query.

Related Solutions

Sql-server – Single sql query to select a record from parent table only if all it’s records in child table has values

This is called an "anti-join" (or anti-semijoin). One way to write it in SQL is to use the NOT EXISTS construction:

SELECT t1.*
FROM Table1 AS t1
WHERE NOT EXISTS
      ( SELECT *
        FROM Table2 AS t2
        WHERE t2.FKcolumn = t1.PKcolumn
          AND t2.columnX IS NULL
      )

If - as your comment - you want a parent row (at Table1) not related to any child row (at Table2), to NOT be shown, add this (or join Table2 if you also want columns from the second table shown):

  AND EXISTS
      ( SELECT *
        FROM Table2 AS t2
        WHERE t2.FKcolumn = t1.PKcolumn
      )

sql-server – Replacement for Nested SELECT Statements for Every Row in SELECT Clause

It's the classic EAV pattern, which is a strong indicator of a design issue. I hope you understand that this design is against the core rules of relational databases. I have a blog post on this subject if you want to dig deeper.

However, if the design can't be changed, you can use a PIVOT query to extract your data:

DECLARE @BaseTable TABLE (
    IDKey int PRIMARY KEY,
    Name varchar(50)
)

INSERT INTO @BaseTable
VALUES
(1, 'Ian'),
(2, 'John'),
(3, 'Ann');


DECLARE @Table1 TABLE (
    ID int,
    Amount int,
    Type char(1),
    PRIMARY KEY (ID, Type)
)

INSERT INTO @Table1
VALUES 
(1, 100, 'X'),
(1, 200, 'Y'),
(2, 300, 'X'),
(2, 400, 'Y'),
(3, 500, 'X'),
(3, 600, 'Y');


SELECT *
FROM @BaseTable AS B
INNER JOIN (
    SELECT ID, X AS AmountX, Y AS AmountY
    FROM @Table1 AS T
    PIVOT (MIN(Amount) FOR Type IN ([X],[Y])) AS P
) AS T1
    ON B.IDKey = T1.ID;

Best Answer

Related Solutions

Sql-server – Single sql query to select a record from parent table only if all it’s records in child table has values

sql-server – Replacement for Nested SELECT Statements for Every Row in SELECT Clause

Related Question