Sql-server – Improving performance when ordering by a column of a joined table

query-performancesql serversql server 2014

I have a parent table that contains a foreign key to a lookup table (simplified example):

CREATE TABLE [dbo].[Parent] (
    [Id] [uniqueidentifier] NOT NULL,
    [LookupId] [uniqueidentifier] NULL
)

CREATE TABLE [dbo].[Lookup] (
    [Id] [uniqueidentifier] NOT NULL,
    [Name] [nvarchar](64) NOT NULL
)

In this case, the Parent table has over 10 million rows and the Lookup table has around 5,000. The real Parent implementation has several such foreign key references to other tables and each of those columns may contain NULLs.

Both example tables have unique clustered indexes for their Id columns, Parent has a non-clustered index for LookupId and Lookup has a non-clustered index for Name.

I'm running a paged query where I want to include the lookup value in the results:-

SELECT
    P.Id,
    L.Name
FROM Parent P
LEFT JOIN Lookup L ON P.LookupId = L.Id 
ORDER BY P.Id
OFFSET 500000 ROWS FETCH NEXT 50 ROWS ONLY

This runs quickly, as does ordering by P.LookupId.

If, however, I try to order by Name (or even L.Id), the query runs considerably slower:

SELECT
    P.Id,
    L.Name
FROM Parent P
LEFT JOIN Lookup L ON P.LookupId = L.Id 
ORDER BY L.Name
OFFSET 500000 ROWS FETCH NEXT 50 ROWS ONLY

The query plan for the second query is here: https://www.brentozar.com/pastetheplan/?id=Sk3SIOvMD

Other seemingly related questions seem to involve ordering by columns in the first table which could be resolved using an appropriate index.

I tried creating an indexed view for this query, however, SQL Server won't allow me to index the view because it contains a LEFT JOIN which I require because LookupId may be NULL and if I use an INNER JOIN those records would be excluded.

Is there a way to optimise this situation?

EDIT

Rob Farley's answer (thanks!) is great and works perfectly for the question as I originally asked it, in which I implied I was joining a single table.

As it is, I have multiple such tables and I was unable to reconcile all using INNER JOINs in order to use that solution.

For the moment I have worked around this by adding a "NULL" row to the lookup tables so I can use an INNER JOIN without losing any rows on the left.

In my case I use uniqueidentifier identities, so I create an indexed view like this:

CREATE VIEW [dbo].[ParentView]
WITH SCHEMABINDING
AS
SELECT
    P.Id,
    L.Name
FROM [dbo].Parent P
INNER JOIN [dbo].Lookup L ON ISNULL(P.LookupId, '00000000-0000-0000-0000-000000000000') = L.Id

I then add a row to the Lookup table with a value of 00000000-0000-0000-0000-000000000000 for Id so there is always a match on the right of the join.

I can then create indexes on that view as needed.

Also, as I'm not using Enterprise, I found I needed to use the NOEXPAND hint to ensure those indexes are used:

SELECT *
FROM [ParentView]
WITH (NOEXPAND)
ORDER BY Name
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY

Best Answer

Let's start by thinking about that first query.

You're joining between Parent and Lookup, but it's an outer join, so Parents are never removed from the results. I'm going to guess that Lookup.Id is unique, so therefore, no Parent is going to have multiple Lookups that it joins to.

Therefore, the 50000th row in Parent (ordered by Parent.Id) is going to be the 50000th row in the results if we don't have the OFFSET clause.

Therefore, the query can move past the 50000 rows for the offset, look at the next 50 rows, and use this to join to the Lookup table. It doesn't matter if the join doesn't find anything, it's a left outer join and it'll just return NULL.

If you order by a different column in Parent, and that's indexed, it can move past those 50000 rows just as quickly.

Now let's consider the second query.

You want the 50000 rows that you ignore (by the offset) to be the first 50000 based on the results of the join. Those 50000 rows may include some that are NULL, where the Parent.LookupId value doesn't exist in the Lookup table. Even if you have a nice index on Parent.LookupId, you'll probably need to involve most of the rows, because unless you find 50050 rows that don't join successfully, you'll need to keep going. Even 50050 is way more than the 50 rows you join on in the first query.

Now, if you have a foreign key in place then things might be a little different. Then, the SQL engine ought to know that if it has a value at all, then Lookup.Name isn't going to be null. So it could theoretically start by finding the ones that are null, to see if there are 50000 of them. But that's still a bit of a stretch, and the SQL engine is unlikely to produce a plan like this.

But you could.

So to solve the performance of the second query, I would do a few things.

Start by considering the ones that aren't null. That means the rows which are part of an inner join. You can make an indexed view on this, so that you can have an index which is in the order you want.

But you will also need the ones where Parent.LookupID is null - except that for these ones, you don't need the join at all.

If you do a UNION ALL across these two sets (and maybe include a constant column in both, to make sure that the NULL rows appear before the NOT NULL ones in your order by), you should be able to see some improvement.

Something like this:

SELECT ID, Name
FROM 
(
  SELECT i.ID, i.Name, 2 as SetNumber
  FROM dbo.MyIndexedView i
  UNION ALL
  SELECT p.ID, NULL, 1 as SetNumber
  FROM dbo.Parent p
  WHERE p.LookupID IS NULL
) u
ORDER BY u.SetNumber, u.Name
OFFSET 50000 ROWS FETCH NEXT 50 ROWS ONLY;

Hopefully your plan will include a Merge Join (Concatenation) operator, so that it only pulls the rows that it needs from an Index Scan on the indexed view (in Name order) and an Index Seek on Parent (for the LookupIDs).

Related Solutions

Mysql – Performance impact of including result column in indexed columns

Based on the information given, I will take a shot at an answer...

You stated

If I add identifiedtex as the last column indexed in the index, will this speed up a lookup of the texture since the result is part of the index?

Let me say that you have the following query:

SELECT identifiedtex FROM texels
WHERE texelx = 210938
AND texely = 378432
AND texelhue = 23;

What would be the effect of having an index (call it Index3)

texelx, texely, texelhue

versus this index (call it Index4)

texelx, texely, texelhue, identifiedtex

When you retrieve identifiedtex using only Index3, it will require an additional table lookup to ascertain the identifiedtex from the table.

When you retrieve identifiedtex using only Index4, it will an index-only lookup because all mentioned columns are in the index. It is known as a covering index.

I mentioned covering indexes in my past posts:

Mar 12, 2012 : How to index this table (a_level, b_level, item_id)
Oct 17, 2012 : Combining columns in index
Nov 13, 2012 : Must an index cover all selected columns for it to be used for ORDER BY? (See My Conclusion)

So to answer your original question, yes adding identifiedtex will index make for a faster query.

Sql-server – Forcing Join Order

It seems that I can specify OPTION (FORCE ORDER) at the end of the query and that will make the joins happen in the right order. There are lots of people warning not to do this because it stops SQL from optimising my query so I deffinately will be using it sparingly (and watching those queries closely).

Best Answer

Related Solutions

Mysql – Performance impact of including result column in indexed columns

Sql-server – Forcing Join Order

Related Question