Sql-server – Improve OR inside INNER JOIN

performancesql serversql-server-2008-r2

I have this query. this is the second query. the first one was using the left/or outside the subquery, and the query plan was horrible.( Edited the question with the full sintax ):

https://www.brentozar.com/pastetheplan/?id=HJEioh56N

I have a nested loop (inner join) with 97% in the query plan.

I'm sure the problem is the OR inside the second join because I changed something here and there and I could get rid of them but I would like to be sure what would be the best way to deal with data like these. also all those tables has millions of rows.

Table Definition:

CREATE TABLE [DBO].[TABLE1](
    [F1] [int] NOT NULL,
    [F1] [varchar](16) NOT NULL,
    [F3] [money] NOT NULL,
    [F4] [money] NOT NULL,

I know created this index:

CREATE NONCLUSTERED INDEX IX_TB1 ON DBO.TABLE1
(
F1,
F2)

That index seek is now 14%, but got a HASH MATCH with 82% cost.

Best Answer

A common query rewrite that helps with OR predicates looks like this:

SELECT tTitulo.CdContaCartao,
       tTitulo.CdStatus,
       MAX(DiariaMaxima)
INTO   #DiariaMaxima
FROM   Sistema.Titulo AS tTitulo
CROSS APPLY
       (   
           SELECT MAX(DiariaMaxima)
               FROM (
               SELECT     tTIPM.DtDiaria
               FROM       Sistema.TaxaIndice_PagamentoMensal_ContaCartao AS tTIPM_Cartao
               INNER JOIN Sistema.TaxaIndice_PagamentoMensal AS tTIPM
                   ON tTIPM.CdTaxaIndice_PagamentoMensal = tTIPM_Cartao.CdTaxaIndice_PagamentoMensal
               WHERE      tTIPM_Cartao.CdContaCartao = tTitulo.CdContaCartao

               UNION ALL

               SELECT     tTIPM.DtDiaria
                          FROM       Sistema.TaxaIndice_PagamentoMensal_ContaCartao AS tTIPM_Cartao
                          INNER JOIN Sistema.TaxaIndice_PagamentoMensal AS tTIPM
                              ON tTIPM.CdTaxaIndice_PagamentoMensal = tTIPM_Cartao.CdTaxaIndice_PagamentoMensal
                          WHERE      tTIPM_Cartao.CdContaCartao = tTitulo.CdContaCartao_Visa
               ) AS x (DiariaMaxima)        
           ) AS DiariaMaxima (DiariaMaxima)

Apply is not always the best method for this, though I've often had success with it over using a regular derived JOIN.

Some background on similar problems here:

Related Solutions

Sql-server – Stream Aggregate for Sort

To answer your first question:

Why is there a Stream Aggregate in the query when I dont have a group by. I am guess it has something to do with the join being merge and its doing a sort?

SQL Server is effectively rewriting your query. This query:

select Object1.Column1
FROM Object2 Object3 WITH (ROWLOCK)
INNER JOIN Object4 Object1 WITH (ROWLOCK) ON Object3.Column2 = Object1.Column1
WHERE Object3.Column2 in (SELECT Column2 FROM Object5)
or Object3.Column3 in (SELECT Column2 FROM Object5);

Can also be written like this:

select Object1.Column1
FROM
(
select Object3.Column2
FROM Object2 Object3 WITH (ROWLOCK)
WHERE Object3.Column2 in (SELECT Column2 FROM Object5)

UNION

select Object3.Column2
FROM Object2 Object3 WITH (ROWLOCK)
WHERE Object3.Column3 in (SELECT Column2 FROM Object5)
) Object3 
INNER JOIN Object4 Object1 WITH (ROWLOCK) ON Object3.Column2 = Object1.Column1;

That's why you have two different access methods on Object2 in the query plan. The Merge Join (Concatenation) operation isn't actually a merge join. It's just implementing a UNION ALL and combining the results. The stream aggregate that you mentioned groups by Object3.Column2. This both removes duplicates from the Merge Join (Concatenation) and sorts the data so that it can be used in the following MERGE JOIN to Object4.

To answer your second question:

Secondly and more importantly why does an estimate of 13 going into the stream aggregate comes out as an estimate of 24,595,900? This is causing the secondary problem of Object4 getting Clustered index scanned instead of a nested loop. I had to split the query into two queries instead of using a OR and the join turns into a nested loop and seeks into Object4.

It looks like a bug. After some searching I found an article by Paul White explaining the issue. The bug is reported to Connect if you want to vote for it or add yourself as affected.

In short, the cardinality estimate is 24595900 because 27328800 (table cardinality of Object2) * 0.9 = 24595900 after rounding. You're on SQL Server 2008 so the legacy CE calculation is used:

A similar issue occurs with the pre-2014 cardinality estimator, but the final estimate is instead fixed at 90% of the estimated semi join input (for entertaining reasons related to a inversed fixed 10% predicate estimate that is too much of a diversion to get into).

I recommend reading through the entire article. The issue is a little difficult to summarize, but I will attempt to do so by quoting part of the text near the end:

The cardinality estimator uses the Fixed Join calculator with 100% selectivity. As a consequence, the estimated output cardinality of the semi join is the same as its input, meaning all 113443 rows from the history table are expected to qualify.

The exact nature of the bug is that the semi join selectivity computation misses any predicates positioned beyond a union all in the input tree. In the illustration below, the lack of predicates on the semi join itself is taken to mean every row will qualify; it ignores the effect of predicates below the concatenation (union all).

A similar issue occurs with the pre-2014 cardinality estimator, but the final estimate is instead fixed at 90% of the estimated semi join input (for entertaining reasons related to a inversed fixed 10% predicate estimate that is too much of a diversion to get into).

Splitting up the query as you did should prevent the issue. You also might have some luck by rewriting the query using UNION (see Example 4 in the article).

Sql-server – Tuning a query with temp table join

Why does the inner join to a one record temp table make the query take so much longer time?

Without the join, the optimizer is smart enough to work out that it can find the minimum value by reading one row from the end of the index.

Unfortunately, it is not currently equipped to apply the same sort of logic when the query is more complicated (with a join or grouping clause, for example). To work around this limitation, you can rewrite the query to compute local minimums per row in the temporary table, then find the global minimum.

Perhaps the easiest way to express this in T-SQL is to use the APPLY operator:

SELECT
    -- Global minimum
    @tenor_from = MIN(MinMaturityPerCurveID.maturity_date)
FROM #source_price_curve_list AS SPCL
CROSS APPLY
(
    -- Minimum maturity_date per price_curve_id
    SELECT TOP (1) 
        SPC.maturity_date
    FROM  dbo.source_price_curve AS SPC
    WHERE
        SPC.source_curve_def_id = SPCL.price_curve_id
         and as_of_date >= @as_of_date_from 
    ORDER BY
        SPC.maturity_date ASC
) AS MinMaturityPerCurveID;

Good performance relies on there being many rows per price_curve_id. You may need an index of the form:

CREATE NONCLUSTERED INDEX
    [IX dbo.source_price_curve source_curve_def_id, maturity_date, as_of_date]
ON dbo.source_price_curve 
(
    source_curve_def_id,
    maturity_date,
    as_of_date
);

Best Answer

Related Solutions

Sql-server – Stream Aggregate for Sort

Sql-server – Tuning a query with temp table join

Related Question