SQL Server – Indexing Referenced Table Columns

index-tuningsql serverstring-searching

I would like to do a search across columns of joined tables. What would the best index configuration be for the query below?

Table A
------------
Id
ValueA
-- Table A has many more columns

Table B
------------
Id
AId
ValueB
-- Table B has many more columns

At the moment, I have clustered indexes on the Primary Keys and non-clustered indexes (which don't include any columns) for columns AId, ValueA, and ValueB.

SELECT *
FROM TableA a
INNER JOIN TableB b ON a.Id = b.AId
WHERE 
    a.ValueA LIKE 'SearchTerm%' OR
    b.ValueB LIKE 'SearchTerm%'

Update
Table A is likely to run into the low millions while Table B will only have low tens of thousands of entries.

In the real world Table A will be a list of transactions, where the value is some sort of a transaction reference. Table B will be a list of users where the value could be anything such as a name or any other descriptive information.

At the moment we use an ORM which will essentially do a SELECT *. Naturally this could be changed, but there would be a fairly large impact on the application side.

Best Answer

Your query is a tricky one for the optimizer, for two reasons.

  1. The optimizer doesn't currently have logic to transform a disjunction (OR) across tables into a union (related Q & A).
  2. The optimizer doesn't reason about deferring lookups (dealing with keys only until the last moment). This relates to the SELECT * component of the given query.

Together, these limitations mean there is very little indexing can do to assist the query as written (ORM generated), leaving aside the possibility of a (nonclustered) column store index.

If performance is inadequate, you may have to bite the bullet and write a custom query. In that case, modifying the existing index on ValueB to include AId would be useful, coupled with a rewrite like the following:

SELECT TA.*, TB.* 
FROM 
(
    -- Disjunction as union
    SELECT A.Id, b.Id
    FROM TableA a 
    INNER JOIN TableB b 
        ON b.AId = a.Id
    WHERE 
        a.ValueA LIKE 'SearchTerm%'

    UNION

    SELECT A.Id, b.Id
    FROM TableB b 
    INNER JOIN TableA a 
        ON a.Id = b.AId
    WHERE 
        b.ValueB LIKE 'SearchTerm%'
) AS M (A_Id, B_Id)
-- Lookups
JOIN dbo.TableA AS TA ON TA.Id = M.A_Id
JOIN dbo.TableB AS TB ON TB.Id = M.B_Id;

The target plan shape is:

execution plan

How much better this is in practice depends on several factors, including how selective the LIKE predicates are (and that they are always prefix searches).


Guessed schema:

DROP TABLE IF EXISTS dbo.TableB, dbo.TableA;

CREATE TABLE dbo.TableA 
(
    Id integer NOT NULL PRIMARY KEY, 
    ValueA varchar(256) NULL INDEX iValueA,
    Padding char(4000) NOT NULL DEFAULT '', -- other columns

    /*INDEX i (ValueA) INCLUDE (Id)*/
);

CREATE TABLE dbo.TableB 
(
    Id integer NOT NULL PRIMARY KEY,
    AId integer NOT NULL REFERENCES dbo.TableA (Id) INDEX iAId,
    ValueB varchar(256) NULL /*INDEX iValueB*/,
    Padding char(2000) NOT NULL DEFAULT '', -- other columns

    INDEX i (ValueB) INCLUDE (AId)
);

UPDATE STATISTICS dbo.TableA WITH ROWCOUNT = 3000000, PAGECOUNT = 3000000;
UPDATE STATISTICS dbo.TableB WITH ROWCOUNT = 30000, PAGECOUNT = 30000;