Sql-server – Forcing Join Order

join;performancesql serversql-server-2005

I have a table with just over a million rows. These rows can have a parent record in the same table which is found by joining to itself on 6 different columns (i.e. there is no single ParentID column). Every child has exactly 1 parent based on these joins, and every record is either a parent or a child record (i.e. there are no grandparent records).

SELECT  *
FROM    TheTable AS ChildRecords
JOIN    TheTable AS ParentRecords
ON      ChildRecords.Column1 = ParentRecords.Column1
AND     ChildRecords.Column2 = ParentRecords.Column2
AND     ChildRecords.Column3 = ParentRecords.Column3
AND     ChildRecords.Column4 = ParentRecords.Column4
AND     ChildRecords.Column5 = ParentRecords.Column5
AND     ChildRecords.Column10 = ParentRecords.Column6

Note column 10 joins to column 6, but this column does not find a unique parent by itself – there could be multiple "parents" with column10 = column6.

This normally works fine, however if we have it as part of a bigger query, SQL Server often tries to resolve this join first before it resolves the others. This is particularly true when it is in a CTE or joining with a CTE. It's often the first join that happens in the query plan. This often results in tens of thousands of joins before it is then filtered down to the 100 or so records that I'm interested in. When this happens the query takes minutes to run.

I've noticed that I can affect the query plan by making it a left join. This makes sense because if it's a left join then SQL Server doesn't know that every child has 1 parent so it always has to find the child records first.

SELECT  *
FROM    TheTable AS ChildRecords
LEFT JOIN TheTable AS ParentRecords
ON      ChildRecords.Column1 = ParentRecords.Column1
AND     ChildRecords.Column2 = ParentRecords.Column2
AND     ChildRecords.Column3 = ParentRecords.Column3
AND     ChildRecords.Column4 = ParentRecords.Column4
AND     ChildRecords.Column5 = ParentRecords.Column5
AND     ChildRecords.Column10 = ParentRecords.Column6

When it runs the query this way it reduces the time from minutes to < 2 seconds.

Because each child has a parent, the left join gives the same result as the inner join, however it feels wrong – it should be an inner join.

I have checked that the indexes are set up properly on this table and I've tried adding and editing the ones that we have but it doesn't change the query plan. The time seems to be because it is querying to get every child/parent combo before it limits it down to just the ones I want.

Can I force SQL Server to run the joins in the order I specified, rather than letting it try to re-order the query?

Best Answer

It seems that I can specify OPTION (FORCE ORDER) at the end of the query and that will make the joins happen in the right order. There are lots of people warning not to do this because it stops SQL from optimising my query so I deffinately will be using it sparingly (and watching those queries closely).