Sql-server – How to structure T-SQL query with many left joined “attributes”

sql servert-sql

I am writing a T-SQL query for a complex report, with a unique key, and have run into a bit of a performance problem. I am not sure where it's coming from, but I thought I'd ask how it should be done.

One issue that may become apparent is that I am doing lots of left joins on the same attribute/lookup tables.

Basically, I the query is structured into a "trunk" and "leaves", like so:

select *
  -- many many fields, probably about 150
  -- I explicitly list them in the real query
  from 
  -- the "trunk" of the query, all of which are necessary, 
  -- and which resolve to a single row
       foo inner join bar on foo.id_foo = bar.id_foo
           inner join baz on baz.id_bar = bar.id_bar
           inner join bon on bon.id_bar = bar.id_bar

   -- the "leaves", where i do a bunch of lookups:
           left join attribute height on height.CD_NM = 'height'
                                     and height.id_foo = foo.id_foo
           left join attribute weight on weight.CD_NM = 'weight'
                                     and weight.id_foo = foo.id_foo
       --    etc, for maybe 50 different attributes, etc
  where foo.id_foo = @ReportId

Currently, I am applying the bisection method to figure out if there is some number of attributes that the database starts choking on. I commented out half the left joins, and un-commented until it started slowing down.

Execution time appears to increase non-linearly, and quickly, in the number of lookups. At one point, the execution time jumped from <1s to 14s, and now each additional look up appears to add a fixed amount of time.

Strangely, the "live query statistics" say that the key lookups on my core tables (which have indexes, which are getting used) are taking up the bulk of the time. I would have presumed this would continue to take less than 1s, regardless of the number of additional lookups.

I'm not even done doing lookups for this query and I'm already closing in on 20 seconds.

I am just a lowly analyst, not a database engineer, so I'm not in a place to do much to tune the database itself, but anything I can include in an SSRS dataset is fair game. (Theoretically, I can make a tuning change happen, but I'd need something concrete to take to the database people)

Is splitting this query up into multiple queries really my best option?


Best Answer

In the original Plan you have few Merge Join and then Nested Loops.

When the things slow down, you can see that plan is changed to: Hash Match (and parallel).

You can try to enforce using old plan by replacing inner join with inner MERGE join and inner LOOP join.

The documentation may help you dive into differences between them: Hints (Transact-SQL) - Join

Take note of the warning:

If a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query, based on the position of the ON keywords.

If you do not wish to force the join order for all joined tables in the query, consider using OPTION (LOOP JOIN, MERGE JOIN) instead. This will allow the optimizer freedom to reorder joins and choose between nested loops and merge join on the basis of estimated cost.