Sql-server – SQL Query optimization – Long running query

execution-planoptimizationperformancequery-performancesql server

I have a database with the following tables :

CREATE TABLE [Document](
    [IdDocument] [int] NOT NULL,
    [DocumentType] [int] NOT NULL
    CONSTRAINT PK_Document PRIMARY KEY CLUSTERED (IdDocument)
)

CREATE TABLE [ExternalKey](
    [IdExternalKey] [int] NOT NULL,
    [RefDocument] [int] NOT NULL,
    [EntityType] [int] NOT NULL,
    [Value] [int] NOT NULL, 
    CONSTRAINT PK_ExternalKey PRIMARY KEY CLUSTERED (IdExternalKey)
    CONSTRAINT FK_RefDocument FOREIGN KEY (RefDocument) REFERENCES Document(IdDocument),
    CONSTRAINT UC_ExternalKey UNIQUE (RefDocument, EntityType, Value)
)
  • Documents are mapped to physical files on a drive. Each document has a type (eg : 2 = IDENTITY_CARD)
  • Those documents are linked with external entities with a one to many relation.
    Each external key has an entity type (eg : 50 = PERSON) and a ID (eg : 213235)

Database is quite big : Document is 10M records. ExternalKey 40M.

Example of data inside ExternalKey:

IdExternalKey  RefDocument EntityType  Value
1              1           50          3421
2              1           50          9524
3              1           60          7893
4              2           50          1752
5              2           50          8979 

I want to filter documents based on their type and the entities they are linked with (simplified query) :

WHERE (DocumentType IN (10, 20, ...) 
       AND ExternalKeys ANY (EntityType = 50 AND Value IN (4,5,6,7,8,9,...)))    
   OR (DocumentType IN (80, 90, ...) 
       AND ExternalKeys ANY (EntityType = 60 AND Value IN (110,120,130,...)))

The best i could come up is this:

SELECT TOP 50 IdDocument
FROM Document d     
WHERE
    (d.DocumentType IN (SELECT Id FROM @listA1) AND d.IdDocument IN (SELECT ex.RefDocument FROM ExternalKey ex WHERE ex.EntityType = 60 AND ex.Value IN (SELECT Id FROM @listB1))) OR
    (d.DocumentType IN (SELECT Id FROM @listA2) AND d.IdDocument IN (SELECT ex.RefDocument FROM ExternalKey ex WHERE ex.EntityType = 61 AND ex.Value IN (SELECT Id FROM @listB2))) OR
    ...
    (d.DocumentType IN (SELECT Id FROM @listA3) AND d.IdDocument IN (SELECT ex.RefDocument FROM ExternalKey ex WHERE ex.EntityType = 59 AND ex.Value IN (SELECT Id FROM @listB3)))
ORDER BY IdDocument

Table value parameters contains a lot of IDs (about 100K in total)
Table definition is:

CREATE TYPE int_list_type AS TABLE(Id int NOT NULL PRIMARY KEY)

The query is slow and performance vary a lot depending the sort (from a dozen of seconds to several minutes). For example, sorting documents on IdDocument is way faster than sorting them on DocumentType.
What I have tried so far:

  • Creating indexes on RefDocument, EntityType, Value (mulitple column index). I have tried different column orders.
  • Replacing the table value parameters (eg : @listA1) by hardcoded values in the query : parsing time explode (as expected).
  • Replacing IN with EXISTS : performance is similar. Replace IN with JOIN : a lot slower

Is there some more efficient ways to perform the filter, any tips ?

Here is the query plan for 2 OR clauses (each OR clause is in RED)
enter image description here

Best Answer

EDITED after question got more details.
I get a Clustered Index Scan and an Index Scan. But I don't have a lot of rows in my table.

CREATE TABLE [Document](
[IdDocument] [int] NOT NULL,
[DocumentType] [int] NOT NULL,
CONSTRAINT PK_Document PRIMARY KEY CLUSTERED (IdDocument)
)

CREATE TABLE [ExternalKey](
[IdExternalKey] [int] NOT NULL,
[RefDocument] [int] NOT NULL,
[EntityType] [int] NOT NULL,
[Value] [int] NOT NULL, 
CONSTRAINT PK_ExternalKey PRIMARY KEY CLUSTERED (IdExternalKey),
CONSTRAINT FK_RefDocument FOREIGN KEY (RefDocument) REFERENCES Document(IdDocument),
CONSTRAINT UC_ExternalKey UNIQUE (RefDocument, EntityType, Value)
)

-- Insert a few values
DECLARE @IdDocument INT, @DocumentType INT, @IdExternalKey INT;
Set @IdDocument=0;
SET @DocumentType = 10;
select @IdExternalKey=MAX(IdExternalKey)+1 from ExternalKey ;


set nocount on
WHILE (@IdDocument < 10000)
BEGIN
    INSERT INTO Document(IdDocument, DocumentType) VALUES (@IdDocument, @DocumentType);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey, @IdDocument, 50, 3421);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey+1, @IdDocument, 50, 9524);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey+2, @IdDocument, 60, 7893);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey+3, @IdDocument, 50, 1752);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey+4, @IdDocument, 50, 8979);
    INSERT INTO [ExternalKey](IdExternalKey,RefDocument,EntityType,Value) VALUES (@IdExternalKey+5, @IdDocument, 60, 7822);

    SET @DocumentType=CASE WHEN @DocumentType > 100 THEN 10 ELSE @DocumentType + 10 END;
    SET @IdDocument = @IdDocument + 1;
    SET @IdExternalKey = @IdExternalKey+ 6;
END

SELECT TOP 50 IdDocument
FROM Document d
JOIN ExternalKey e on e.RefDocument=d.IdDocument 
WHERE
(d.DocumentType IN (10,20,30) AND e.EntityType = 60 AND e.Value IN (4,5,6,7,8)) OR
(d.DocumentType IN (50) AND e.EntityType = 61 AND e.Value IN (110,120,130) OR
(d.DocumentType IN (80,90) AND e.EntityType = 59 AND e.Value IN (1,2,3)))
ORDER BY IdDocument

I would add that if the ORDER BY is not necessary and can be done in the program instead, it can help accelerate the query.

enter image description here

What you really want to do to improve your query would be to fix the "Index Scan" of the ExternalKey table or the scan of the Document table. For example, since the ExternalKey is unique by RefDocument, entityType, Value, maybe consider using it as clustered index?


The thing is right now, with the keys and indexes, SQL Server has no choice but to scan the whole table to get the rows you need. So if you have a million row, it has to scan every row even if you only get 10 as a result.