In our application we have a slow performing reporting query that needs to be optimized.
The performance loss is mainly because there are up to 11 left joins on the same table and we can't put in foreign keys. There already are indexes on the typeid as they are foreign keys to a Type table. The situation can be reduced and abstracted to the following:
USE tempdb;
IF OBJECT_ID('person') IS NOT NULL
DROP TABLE person;
IF OBJECT_ID('AdditionalColumnValue') IS NOT NULL
DROP TABLE AdditionalColumnValue;
IF OBJECT_ID('AdditionalColumn') IS NOT NULL
DROP TABLE AdditionalColumn;
CREATE TABLE person (
Id INT PRIMARY KEY
);
CREATE TABLE AdditionalColumn (
Id INT PRIMARY KEY,
Name VARCHAR(100)
);
CREATE TABLE AdditionalColumnValue (
Id INT PRIMARY KEY,
EntityId INT,
AdditionalColumnId INT FOREIGN KEY REFERENCES AdditionalColumn (Id),
[value] NVARCHAR(4000)
);
INSERT INTO person VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
INSERT INTO AdditionalColumn
VALUES
(1, 'EatsStoneFruit'),
(2, 'EatsBerry'),
(3, 'EatsCitrus'),
(4, 'FavoriteAnimal');
INSERT INTO AdditionalColumnValue
VALUES
(1, 1, 'mango'),
(1, 1, 'lychee'),
(1, 2, 'blueBerry'),
(1, 2, 'raspberry'),
(1, 3, 'lemon'),
(1, 3, 'orange'),
(1, 3, 'blood orange'),
(1, 3, 'grapefruit'),
(1, 4, 'earthworm'),
(1, 4, 'raven'),
(2, 1, 'mango'),
(2, 2, 'raspberry'),
(2, 3, 'lemon'),
(2, 3, 'grapefruit'),
(2, 4, 'tardigrade');
-- Query
SELECT person.Id, fruits.value fruits, animals.value animals
FROM person
LEFT JOIN AdditionalColumnValue fruits on fruits.EntityId = person.Id and fruits.AdditionalColumnId in (1,2,3)
LEFT JOIN AdditionalColumnValue animals on animals.EntityId = person.Id and animals.AdditionalColumnId in (4)
-- up to 9 more of these
I have the idea I should be able to optimize by somehow partitioning on the typeId's and then only do the costly join once.
But I can't figure out how.
As I expect people to wonder why I'm not solving this problem on a DML level:
This table allows our consultants to extend any table with non-standard fields on request.
As they can be linked to any table the link between the tables is on convention, not on foreign key.
It probably was a good solution when it was used sparingly, but things evolved. The business value of this system's flexibility is considerable and it will be a hard sell to change anything about it. If I can't get this to work reasonably fast setting up a batch job to to denormalize the data would be our plan B.
UPDATE
in my effort to boil the problem down to it's core I made some mistakes. Updated the query and added buildup sql.
Best Answer
The
LEFT
joins are actuallyINNER
joins, as all the tables in the right sides have columns that appear inWHERE
clause. So you could change them - just to be precise, there is no efficiency gain.For the performance problems, my suggestion would be to add various composite indexes. EAV designs need queries with multiple joins and without appropriate indexes, they are often inefficient.
For the specific query, you could try indexes either on
(Entity, TypeID, EntityID) INCLUDE (Value)
or on(Entity, EntityID, TypeID) INCLUDE (Value)
. Then check the execution plans to see which of the two is used.If the conditions
TypeId IN (1,2,3)
, etc are fairly common and stable, you could use even more targeted indexes, like this: