Sql-server – Optimize series of queries that recycle the same filter

cteperformancesql servertemporary-tables

I have about 30 queries across 50 tables in total which each pull or aggregate information to be later all joined together. The original code provided used entirely temp tables, but it is difficult to maintain. A common theme across the queries is the filter in the WHERE clause which selects the records we want.

I tried to convert the temp tables into a series of chained CTE's and used the WHERE clause as a filter CTE which is inner joined to all child CTE's so that when that filter is changed, it cascades to all other queries. Also, when new queries are added, they simply need to join to this CTE and the filter is applied. The idea is we start with the base filter table with only the IDs of the desired records, then left join to append the measures we want.

The problem is that this has a huge performance degradation compared to the temp tables. We've sacrificed performance for consistency, modularity, and ease of maintenance. Is there any way to tweak the performance in our favor, though? The biggest hits seem to happen on those that use windowing functions to concatenate strings across multiple rows into one row per record.

I've never attempted something of this nature before, and CTE's seemed like the logical approach. What was once a 4-5 minute query is now taking 30 minutes.

How it looks with temp tables:

select
    ...
INTO
    #temp1
FROM
    ...
WHERE
    <repeated filter>
    AND
    <temp1 filter>
select
    ...
INTO
    #temp2
FROM
    ...
WHERE
    <repeated filter>
    AND
    <temp2 filter>

SELECT
    ...
FROM
    #temp1
        join
    #temp2
WHERE
    <repeated filter>

And with the CTE's:

WITH Filter AS (
    SELECT
        ...
    FROM
        ...
    WHERE
        ...
), Query1 AS (
    SELECT
        ...
    FROM
        ...
            INNER JOIN
        Filter
    WHERE
        <query1 filter>
), Query2 AS (
    SELECT
        ...
    FROM
        ...
            INNER JOIN
        Filter
    WHERE
        <query2 filter>
)
SELECT
    ...
FROM
    Filter
        LEFT JOIN
    Query1
        LEFT JOIN
    Query2

Best Answer

The key thing to realize is that a common table expression (CTE) is not a table; it's an expression. In SQL Server at least, a CTE is not materialized; the query is not run once, with the results re-used if the CTE appears more than once in the main query. Every time you reference the CTE, that query is re-executed.

So, you may want to consider a hybrid approach.

I would build a temporary table with the basic filtered data - what's currently returned by the Filter CTE. I would also expand it slightly. Currently you say it only returns an ID value. If there are any other values that you'll need in the final query (especially if they come from tables that aren't involved in intermediate queries), include that in the temp table - otherwise, you'll just have to go back and get it later.

Note that you can index a temporary table. Unless the temp table rows are wider than I'd expect, I'd put a clustered index on the IDs. This may help with other joins. (Note also that populating a table using an ORDER BY may happen to insert data in a particular order, but it guarantees nothing that the DB engine will recognize - it will consider the data as unsorted).

Now, use the temporary Filter table in your list of CTEs. It will not have to be reconstructed for every query it's used in, which should improve things quite a bit.

If there are other CTEs in your list that are used more than once, you may want to consider making them temporary tables as well. However, depending on the complexity of the query, two or three uses may not cause a significant performance loss, compared to the benefit of maintainability.