Sql-server – Why does this CTE return the bulk of its results immediately, but then take several minutes to finish

performancequery-performancerecursivesql serversql-server-2008

I've got a schema roughly like this (this is a simplification of my actual schema):

CREATE TABLE foo (
    key1 NUMERIC(6) NOT NULL,
    key2 VARCHAR(32) NOT NULL,
    val VARCHAR(255) NULL,
    CONSTRAINT foo_pk PRIMARY KEY (key1, key2) --PK on key1, key2
    )
GO
CREATE TABLE bar (
    key1 NUMERIC(6) NOT NULL,
    key2 VARCHAR(32) NOT NULL,
    val VARCHAR(255) NULL,
    CONSTRAINT bar_pk PRIMARY KEY (key1, key2) --PK on key1, key2
    )
GO
CREATE TABLE aliases (
    id VARCHAR(32) NOT NULL PRIMARY KEY,
    text VARCHAR(255) NOT NULL,
    CONSTRAINT aliases_uk UNIQUE (text) --PK on id, unique constraint on text
    )
GO

INSERT INTO aliases (id, text)
SELECT REPLACE(NEWID(), '-', ''), 'Text1' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text2' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text3' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text4' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text5' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text6' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text7' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text8' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text9' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text10' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text11' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text12' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text13' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text14' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text15' UNION ALL
SELECT REPLACE(NEWID(), '-', ''), 'Text16';
GO

BEGIN
    DECLARE @i INT = 1;
    WHILE (@i <= 234)
    BEGIN
        INSERT INTO foo (key1, key2, val) SELECT @i, id, id FROM aliases;
        SET @i = @i + 1;
    END;
    INSERT INTO bar (key1, key2, val) SELECT key1 + 234, key2, val FROM foo;
END;
GO

I want to get a list of all key1 and key2 pairs (using the user-friendly names for key2, with the user-friendly aliases for the vals concatenated. My initial attempt looks like this:

WITH
foos_and_bars (key1, key2, val) AS (
    SELECT key1, key2, val FROM foo UNION ALL
    SELECT key1, key2, val FROM bar),
texts (key1, key2_name, val_text, rnum) AS (
    SELECT key1, a1.text, a2.text, ROW_NUMBER() OVER (PARTITION BY key1, key2 ORDER BY val)
    FROM foos_and_bars
    JOIN aliases a1 ON a1.id = foos_and_bars.key2
    LEFT JOIN aliases a2 ON a2.id = foos_and_bars.val),
partitioned (key1, key2_name, val_text, rnum, maxnum) AS (
    SELECT key1, key2_name, val_text, rnum, MAX(rnum) OVER (PARTITION BY key1, key2_name)
    FROM texts),
recurse (key1, key2_name, val_texts, rnum, maxnum) AS (
    SELECT key1, key2_name, val_text, rnum, maxnum
    FROM partitioned
    WHERE rnum = 1
    UNION ALL
    SELECT r.key1,
        r.key2_name, 
        CAST(r.val_texts + CHAR(13) + CHAR(10) + p.val_text AS VARCHAR(255)),
        p.rnum,
        r.maxnum
    FROM recurse r
    JOIN partitioned p
    ON p.key1 = r.key1
    AND p.key2_name = r.key2_name
    AND p.rnum = r.rnum + 1)
SELECT * FROM recurse WHERE rnum = maxnum;

It works, and I get 7480 rows in less than a second, but then SSMS runs for no less than 2 minutes and 40 seconds puzzling out the last 8 rows. These last rows are no more complex than the first (in fact, with the data currently in the system, rnum and maxnum are never greater than 1), and as these tables were created exclusively for mocking on this SE question, there should be no locks.

The issue seems to be limited to the recursive CTE, as selecting from the top down to partitioned yields 7488 rows in less than a second. What could the CTE be getting stuck on?

Execution plans (using a setup that matches the simplified schema):

The output is going into a view that is going to back a Sybase window, which will be used for searching records. As such, it needs to be capable of retrieving all the rows, though the program pulls 1000 rows at a time by default.

Best Answer

The anchor part of the recursive SQL query produces all 7488 rows. On my machine, that part of the query finishes in under 100 ms. SSMS does not immediately show all 7488 rows in the grid results. It only shows 7480 for me as well. I suspect this happens because the results are sent in packets and the remaining 8 rows aren't enough to fill a packet.

The second half of the query is extremely expensive compared to the first. Looking at the execution plan you can see six clustered index scans on the inner side of a nested loop. Those scans will execute at least once for every row from the anchor part. SQL Server does a total of 44928 scans over a few minutes to ultimately produce 0 rows from that part. Once that part of the execution is complete SQL Server sends the remaining packet of rows to the client and you see all 7488 rows in the results grid of the completed query. It's not true that it takes three minutes to find the last 8 eight rows. It finds them almost instantly. It just takes three minutes to send them.

You can see a large amount of work performed in the bottom half of the query by looking at the actual plan. Note the thickness of the arrows and the number of executions for the scan operators:

actual plan

That's why the query is slow. I don't have any advice for you to make it fast because I don't understand what you're trying to accomplish with this query.