SQL Server – Better Way for Multiple Joins on Same Table with Different ID Ranges

sql server

In our application we have a slow performing reporting query that needs to be optimized.
The performance loss is mainly because there are up to 11 left joins on the same table and we can't put in foreign keys. There already are indexes on the typeid as they are foreign keys to a Type table. The situation can be reduced and abstracted to the following:

USE tempdb;

IF OBJECT_ID('person') IS NOT NULL
  DROP TABLE person;
IF OBJECT_ID('AdditionalColumnValue') IS NOT NULL
  DROP TABLE AdditionalColumnValue;
IF OBJECT_ID('AdditionalColumn') IS NOT NULL
  DROP TABLE AdditionalColumn;

CREATE TABLE person (
  Id INT PRIMARY KEY
);
CREATE TABLE AdditionalColumn (
  Id   INT PRIMARY KEY,
  Name VARCHAR(100)
);
CREATE TABLE AdditionalColumnValue (
  Id                 INT PRIMARY KEY,
  EntityId           INT,
  AdditionalColumnId INT FOREIGN KEY REFERENCES AdditionalColumn (Id),
  [value]            NVARCHAR(4000)
);

INSERT INTO person VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
INSERT INTO AdditionalColumn
VALUES
  (1, 'EatsStoneFruit'),
  (2, 'EatsBerry'),
  (3, 'EatsCitrus'),
  (4, 'FavoriteAnimal');
INSERT INTO AdditionalColumnValue
VALUES
  (1, 1, 'mango'),
  (1, 1, 'lychee'),
  (1, 2, 'blueBerry'),
  (1, 2, 'raspberry'),
  (1, 3, 'lemon'),
  (1, 3, 'orange'),
  (1, 3, 'blood orange'),
  (1, 3, 'grapefruit'),
  (1, 4, 'earthworm'),
  (1, 4, 'raven'),

  (2, 1, 'mango'),
  (2, 2, 'raspberry'),
  (2, 3, 'lemon'),
  (2, 3, 'grapefruit'),
  (2, 4, 'tardigrade');

-- Query
SELECT person.Id, fruits.value fruits, animals.value animals
FROM person
LEFT JOIN AdditionalColumnValue fruits on fruits.EntityId = person.Id and fruits.AdditionalColumnId in (1,2,3)
LEFT JOIN AdditionalColumnValue animals on animals.EntityId = person.Id and animals.AdditionalColumnId in (4)
-- up to 9 more of these

I have the idea I should be able to optimize by somehow partitioning on the typeId's and then only do the costly join once.
But I can't figure out how.

As I expect people to wonder why I'm not solving this problem on a DML level:
This table allows our consultants to extend any table with non-standard fields on request.
As they can be linked to any table the link between the tables is on convention, not on foreign key.
It probably was a good solution when it was used sparingly, but things evolved. The business value of this system's flexibility is considerable and it will be a hard sell to change anything about it. If I can't get this to work reasonably fast setting up a batch job to to denormalize the data would be our plan B.

UPDATE
in my effort to boil the problem down to it's core I made some mistakes. Updated the query and added buildup sql.

Best Answer

The LEFT joins are actually INNER joins, as all the tables in the right sides have columns that appear in WHERE clause. So you could change them - just to be precise, there is no efficiency gain.

For the performance problems, my suggestion would be to add various composite indexes. EAV designs need queries with multiple joins and without appropriate indexes, they are often inefficient.

For the specific query, you could try indexes either on (Entity, TypeID, EntityID) INCLUDE (Value) or on (Entity, EntityID, TypeID) INCLUDE (Value). Then check the execution plans to see which of the two is used.

If the conditions TypeId IN (1,2,3), etc are fairly common and stable, you could use even more targeted indexes, like this:

 (Entity, EntityID) INCLUDE (Value) WHERE TypeID IN (1,2,3)

Related Solutions

Sql-server – How tondex a JOIN result of two or more tables in order to improve the performance in SQL server

A foreign key does not have an index by default. You have to create one.

In this case, I'd suggest either of these. It depends on relative selectivity of the 2 columns

CREATE INDEX IX_table2fk ON
              table1 (t2_col1, t1_col3)
              INCLUDE (any columns needed for the query)

CREATE INDEX IX_table2fk ON
              table1 (t1_col3, t2_col1)
              INCLUDE (any columns needed for the query)

The INCLUDE will make the index covering to avoid key/bookmark lookups

Sql-server – Update many-to-many relationship table

The first thing you'll want to do is make sure there is an alternate key (usually via a unique constraint) defined in the taskmembers table, that consists of the task id and member id. This key will drive all data manipulation, and the surrogate key won't be involved at all.

Once you have that key in place, I suggest using the MERGE statement to perform data manipulation. This will avoid having to delete a subset of data and re-insert it. Instead, by matching the source and target data based on the alternate key, incremental insert/updates can take place. The MERGE statement also allows you to delete from the target where records don't exist in the source (i.e., a member of the task was removed).

The DELETE/INSERT pattern is really wasteful of resources because of the extra amount of logging involved, and the unnecessary locks that are required while the data manipulation occurs. If this is a "hot" table, this kind of pattern can easily lead to deadlocks (if not using snapshot isolation), and as you're finding out, wasted key space usage. With the MERGE pattern, new keys would only be required when members are actually added to tasks.

Best Answer

Related Solutions

Sql-server – How tondex a JOIN result of two or more tables in order to improve the performance in SQL server

Sql-server – Update many-to-many relationship table

Related Question