T-sql – Null Values in a CASE statement

casechecksuminsertt-sql

I am playing around with some stuff in SSMS to learn a little more as I study for my 70-461 exam and I came across a little hangup. I am trying to create a table to play around with so I don't have to alter/delete any of the already created tables in the AdventureWorks or TSQL2012 databases. I've created a temp table to test my code before I actually create a table to play with and this is the code I am using to insert values into my table:

DECLARE @i INT = 1
 WHILE @i < 10
    BEGIN
    INSERT INTO #TestEmployeeCountry 
    VALUES ( SUBSTRING('ABCDEFGHIJKLMNOP', @i, 1), 
        CASE (SELECT ABS(CHECKSUM(NEWID()))%10 +1)
            WHEN 1 THEN 'USA'
            WHEN 2 THEN 'CANADA'
            WHEN 3 THEN 'MEXICO'
            WHEN 4 THEN 'UK'
            WHEN 5 THEN 'FRANCE'
            WHEN 6 THEN 'SPAIN'
            WHEN 7 THEN 'RUSSIA'
            WHEN 8 THEN 'CHINA'
            WHEN 9 THEN 'JAPAN'
            WHEN 10 THEN 'INDIA'
        END)
    SET @i = @i + 1
    END;

The problem I am having is I keep getting an error saying "Cannot insert the value NULL into column 'Country', table 'tempdb.dbo.#TestEmployeeCountry" The reason this is is because I have the Country column set to NOT NULL, and my code does work for some of the inserts, the problem is I randomly get NULL values out of my case statement.

I know that to fix this I can easily add another line that says "DEFAULT xxxxxx" however I want to understand what is going on because based on what I see I shouldn't have to do that, should I? I thought I wrote my case statement correctly, giving me a number between 1-10 only and upon testing just that specific select statement over 1000 tries, I always get a random number between 1-10, nothing larger or smaller. Can anyone help me to understand why this code tries to enter a NULL value into that column?

Best Answer

Why this happens has already been answered by @PaulWhite in the SO question: How does this CASE expression reach the ELSE clause?

To solve it, you should calculate the ABS(CHECKSUM(NEWID()))%10 +1 outside/before the INSERT statement so it is calculated once. Something like:

DECLARE @i INT = 1 ;
DECLARE @rand INT ;
 WHILE @i <= 10
   BEGIN
    SET @rand = ABS(CHECKSUM(NEWID()))%10 +1 ;
    INSERT INTO TestEmployeeCountry 
    VALUES ( SUBSTRING('ABCDEFGHIJKLMNOP', @i, 1), 
        CASE @rand
            WHEN 1 THEN 'USA'
            WHEN 2 THEN 'CANADA'
            WHEN 3 THEN 'MEXICO'
            WHEN 4 THEN 'UK'
            WHEN 5 THEN 'FRANCE'
            WHEN 6 THEN 'SPAIN'
            WHEN 7 THEN 'RUSSIA'
            WHEN 8 THEN 'CHINA'
            WHEN 9 THEN 'JAPAN'
            WHEN 10 THEN 'INDIA'
        END) ;
    SET @i = @i + 1 ;
   END ;

Also notice that with your code, the 10 countries will not be placed in the table with equal probability! The first country (USA) will have 10% chance, the second will have 9% ( (100%-10%)*10% ), the third 8.1%, ((100%-19%)*10%), etc. That leaves a not so small chance (around 1/e) that none of the 10 is chosen and the CASE expression goes to the default ELSE NULL and you get the error. (You can check the probabilities if you allow nulls in the column and run the SQLfiddle script.)

According to the above, another way to solve it would be to change the expressions to comply with how SQL-Server executes the CASE and all 10 cases have the same probability:

    CASE 0
        WHEN ABS(CHECKSUM(NEWID()))%10 THEN 'USA'
        WHEN ABS(CHECKSUM(NEWID()))%9 THEN 'CANADA'
        WHEN ABS(CHECKSUM(NEWID()))%8 THEN 'MEXICO'
        WHEN ABS(CHECKSUM(NEWID()))%7 THEN 'UK'
        WHEN ABS(CHECKSUM(NEWID()))%6 THEN 'FRANCE'
        WHEN ABS(CHECKSUM(NEWID()))%5 THEN 'SPAIN'
        WHEN ABS(CHECKSUM(NEWID()))%4 THEN 'RUSSIA'
        WHEN ABS(CHECKSUM(NEWID()))%3 THEN 'CHINA'
        WHEN ABS(CHECKSUM(NEWID()))%2 THEN 'JAPAN'
        ELSE 'INDIA'
    END