Sql-server – strange optimization of query results in phantom value conversions

querysql serversql-server-2005

-- DB collation is German_PhoneBook_BIN
CREATE TABLE [dbo].[partition_tbl](
    [tbl_name] [varchar](255) NOT NULL,
) ON [PRIMARY]
INSERT INTO partition_tbl VALUES ('some_table_{num}')
INSERT INTO partition_tbl VALUES ('other_table_{num}')

assuming in your database are the following tables

some_table_123
some_table_444
some_table_hist
some_table_555
other_table_100
other_table_4321

the following statement

SELECT a.tbl_name, REVERSE(LEFT(REVERSE(b.name),PATINDEX('%[_]%',REVERSE(b.name))-1)) AS number
FROM partition_tbl a
INNER JOIN dbo.sysobjects b 
ON (
    b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9]')
    OR
    b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9][0-9]')
) AND b.xtype='U'

should return

tbl_name          | number
------------------|-------
some_table_{num}  | 123
some_table_{num}  | 444
some_table_{num}  | 555
other_table_{num} | 100
other_table_{num} | 4321

you can see, that some_table_hist is omitted. But why can't I convert the column number (which is a NVARCHAR at this time) to type INT with both following statements?

SELECT a.tbl_name, CAST(REVERSE(LEFT(REVERSE(b.name),PATINDEX('%[_]%',REVERSE(b.name))-1)) AS INT) AS number
FROM partition_tbl a
INNER JOIN dbo.sysobjects b 
ON (
    b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9]')
    OR
    b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9][0-9]')
) AND b.xtype='U'

and

SELECT tbl_name,CAST(number AS INT) AS number FROM (
    SELECT a.tbl_name, REVERSE(LEFT(REVERSE(b.name),PATINDEX('%[_]%',REVERSE(b.name))-1)) AS number
    FROM partition_tbl a
    INNER JOIN dbo.sysobjects b 
    ON (
        b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9]')
        OR
        b.name COLLATE German_PhoneBook_BIN LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[0-9][0-9][0-9][0-9]')
    ) AND b.xtype='U'
) x

Both statement results in an error: Failure on converting the nvarchar value hist to int data type (real error in german says: Fehler beim Konvertieren des nvarchar-Wertes 'hist' in den int-Datentyp). Huh? I can see no hist value in resultset above!? Does MSSQL doing a wrong statement optimization?

(BTW: I'm using now another statement which fixes this issue. I'm only interested in the reason why this strange effect occurs)

Best Answer

As expected.

SQL is declarative and there is no implicit ordering of execution. You will see that the CAST is happening before the filter, so you get the error. That is, you have assumed that only int values will get through the CAST. Y

You should assume that all values in the column will be converted to int. The value here is "hist" which has been parsed.

There are a few ways to ensure that only CASTable values will be presented to the CAST

  1. Materialise the JOIN so it has to run before the CAST
  2. Use CASE to manage the CAST (see below)
  3. Use a table variable/temp table to cache results

Example. You can't rely on the CAST happening after the derived table.

SELECT
    tbl_name,
    CASE WHEN num NOT LIKE '[^0-9]' THEN CAST(num AS int) END
FROM
    (
    SELECT 
        a.tbl_name,
        CAST(REVERSE(LEFT(REVERSE(b.name),PATINDEX('%[_]%',REVERSE(b.name))-1)) AS num
    FROM 
        partition_tbl a
        INNER JOIN 
        sys.objects b ON
                 b.name COLLATE German_PhoneBook_BIN 
                        NOT LIKE REPLACE(REPLACE(a.tbl_name,'_','[_]'),'{num}','[^0-9]')
    WHERE
        b.xtype='U'
    ) T