Sql-server – Tuning performance of finding entities with fully matching attribute values

performancequery-performancesql serversql-server-2016

How to tackle performance tuning of finding matches of entities with the same attributes and values, in an OLTP entity-attribute-value table?

The query needs to run hundreds of times per minute from several application threads in parallel, and the performance needs to be quick and without bringing down the whole instance (ideally under 50ms duration, with accordingly low CPU and IO) — but calculating all the matches is aggregating across at least hundreds of thousands of rows.

What approaches exist to tackle this type of problem?

Repro to test:

Simplified repro

Best Answer

Right now the query that you have looks pretty good. It seems like performance scales with the number of attributes on the entity that you're checking against. For example, if I test with the following parameters:

DECLARE @search_entity_id INT = 2896182;
DECLARE @current_count INTEGER = 8;

The query that you have with MAXDOP 1 takes about 188 ms CPU time and 188 ms elapsed time on my desktop. If I change the parameters to these:

DECLARE @search_entity_id INT = 1324226;
DECLARE @current_count INTEGER = 4;

Then CPU time drops to 93 ms CPU and elapsed time drops to 90 ms.

One trick that can improve performance is to take advantage of batch mode execution. Batch mode execution processes rows up in batches up to 900 rows instead of row by row to cut down on metadata and access costs. It was designed to work with CCIs but can also work with data from rowstore tables. There are a few different tricks to get batch mode with rowstore queries. Here, I'll just create an empty NCCI against the temp table:

CREATE NONCLUSTERED COLUMNSTORE INDEX X_CCI ON #base (entity_id)
WHERE entity_id = -1 AND entity_id = -2;

As of SQL Server 2016, batch mode processing is supported for MAXDOP 1 queries and for the sort operator. Both of those enhancements are helpful for this query. After creating the index we can see the difference in the query plan:

batch mode

The previous 8 row example now executes in about 110 ms of CPU time and 115 ms of elapsed time. The 4 row example executes in about 62 ms of CPU time and 60 ms of elapsed time. I got a savings of around 33% of my machine, but I don't know if that's good enough for your workload's requirements.