Sql-server – How to make this nested query more efficient

performancequeryquery-performancesql server

I have 3 tables: Room, Conference, and Participant. Room has many Conferences, and Conference has many Participants. I need my query to display the fields from Room, as well as the number of associated Conferences it has, and the sum of the number of associated Participants each Conference has. Here's a cut-down version of the SELECT query I wrote to get this info; first, I just selected the room ID:

SELECT TOP(1000)
  rm.[Id]
FROM
  [Room] rm
LEFT JOIN (
  SELECT
    conf.[Id] AS [ConferenceId],
    MIN(conf.[Name]) AS [ConferenceName],
    MIN(conf.[RoomId]) AS [RoomId],
    COUNT(part.[Id]) AS CalcConferenceParticipantCount
  FROM
    [Conference] conf
  LEFT JOIN
    [Participant] part on part.[ConferenceId] = conf.[Id]
  GROUP BY
    conf.[Id]
  ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
  rm.[Id]

This was very fast as SQL Server was able to just pull the data from Room and pretty much ignore the subquery (see Trial 1 – Trial 4 in image below). Then I added in the ConferenceName field from the subquery, as well as a count of the number of conferences per room:

SELECT TOP(1000)
  rm.[Id],
  COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
  MIN(confData.[ConferenceName])
FROM
  [Room] rm
LEFT JOIN (
  SELECT
    conf.[Id] AS [ConferenceId],
    MIN(conf.[Name]) AS [ConferenceName],
    MIN(conf.[RoomId]) AS [RoomId],
    COUNT(part.[Id]) AS CalcConferenceParticipantCount
  FROM
    [Conference] conf
  LEFT JOIN
    [Participant] part on part.[ConferenceId] = conf.[Id]
  GROUP BY
    conf.[Id]
  ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
  rm.[Id]

This slowed down the query quite a bit, by a factor of about 100 (see Trial 5 – Trial 7 in image below). I then added in the participant count from the subquery, meaning there were 2 levels of aggregate functions being used:

SELECT TOP(1000)
  rm.[Id],
  COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
  MIN(confData.[ConferenceName]),
  SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM
  [Room] rm
LEFT JOIN (
  SELECT
    conf.[Id] AS [ConferenceId],
    MIN(conf.[Name]) AS [ConferenceName],
    MIN(conf.[RoomId]) AS [RoomId],
    COUNT(part.[Id]) AS CalcConferenceParticipantCount
  FROM
    [Conference] conf
  LEFT JOIN
    [Participant] part on part.[ConferenceId] = conf.[Id]
  GROUP BY
    conf.[Id]
  ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
  rm.[Id]

This further slowed down the query by a factor of about 4 (see Trial 8 – Trial 10 in image below). Here's the client statistics with data on the 10 trials:

Here's the query plan of the slow query: https://www.brentozar.com/pastetheplan/?id=SJpyeec5Q

Is there a way I can make this kind of query – where I calculate an aggregate of a subquery's aggregate – more efficient?

Best Answer

I mocked up data by looking at the row counts in your tables, giving them an even data distribution, and making guesses about the schema:

DROP TABLE IF EXISTS [Room];

CREATE TABLE [Room] (
    [Id] BIGINT NOT NULL,
    FILLER VARCHAR(200) NOT NULL,
    PRIMARY KEY ([Id])
);

INSERT INTO [Room] WITH (TABLOCK)
SELECT TOP (3088) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), REPLICATE('Z', 200)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS [Conference];

CREATE TABLE [Conference] (
    [Id] BIGINT NOT NULL,
    [Name] VARCHAR(30) NOT NULL,
    [RoomId] BIGINT NOT NULL,
    FILLER VARCHAR(200) NOT NULL,
    PRIMARY KEY ([Id])
);


INSERT INTO [Conference] WITH (TABLOCK)
SELECT RN
, 'MY FAVORITE MEETING ROOM'
, 1 + RN % 3088
, REPLICATE('Z', 200)
FROM
(
    SELECT TOP (97413) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) q
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS [Participant];

CREATE TABLE [Participant] (
    [Id] BIGINT NOT NULL,
    [ConferenceId] BIGINT NOT NULL,
    FILLER VARCHAR(200) NOT NULL,
    PRIMARY KEY ([Id])
);


INSERT INTO [Participant] WITH (TABLOCK)
SELECT RN
, 1 + RN % 97413
, REPLICATE('Z', 200)
FROM
(
    SELECT TOP (235323) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) q
OPTION (MAXDOP 1);


CREATE INDEX NCI_Part ON [Participant] ([ConferenceId]) INCLUDE (Id);

The most important assumption that I made about the schema is that the Id column is the primary key of the [Conference] table. This seemed reasonable given the query plan and the index names involved.

On my machine I get the same query plan as you, but my starting query only takes 163 ms of CPU. I assume the differences come down to differences in hardware, data distribution, and the fact that I'm not returning data to the client.

The first thing that jumped out to me is the unnecessary GROUP BY in your confData derived table. Id is the primary key of the table so you don't need all of the aggregates. With the right indexes (which you already have for this particular case), subqueries aren't necessarily a bad thing. Rewriting what you have to remove the GROUP BY:

SELECT TOP(1000)
  rm.[Id],
  COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
  MIN(confData.[ConferenceName]),
  SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM
  [Room] rm
LEFT JOIN (
  SELECT
    conf.[Id] AS [ConferenceId],
    conf.[Name] AS [ConferenceName],
    conf.[RoomId] AS [RoomId],
    (
        SELECT COUNT(part.[Id])
        FROM [Participant] part
        WHERE part.[ConferenceId] = conf.[Id]
    ) AS CalcConferenceParticipantCount
  FROM
    [Conference] conf
  ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
  rm.[Id]
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

This results in the stream aggregate getting pushed down further into the plan:

The uploaded plan takes 113 ms of CPU. The same operators are present, but some of them process fewer rows which saves time. You may be able to make this query more efficient by defining a covering index on the [Conference] with Id as the index key. This may seem like an odd thing to do, but your clustered index scan takes 10% of the overall query time and likely includes columns that you don't need.

If you want to make the query faster you could also consider an indexed view. Why perform the aggregation every time when you can define a simple indexed view to do it for you?

CREATE VIEW IndexedViewOnParticipant WITH SCHEMABINDING
AS
SELECT [ConferenceId], COUNT_BIG([Id]) CntId, COUNT_BIG(*) Cnt
FROM dbo.[Participant]
GROUP BY [ConferenceId];

GO

CREATE UNIQUE CLUSTERED INDEX CI ON IndexedViewOnParticipant ([ConferenceId]);

This will result in a little more space and a little bit of overhead when doing DML on the table. Overall I'd say that it's a good use case for an indexed view. Rewriting the query again:

SELECT TOP(1000)
  rm.[Id],
  COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
  MIN(confData.[ConferenceName]),
  SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM 
  [Room] rm
LEFT JOIN (
  SELECT
    conf.[Id] AS [ConferenceId],
    conf.[Name] AS [ConferenceName],
    conf.[RoomId] AS [RoomId],
    (
        SELECT CntId
        FROM IndexedViewOnParticipant part WITH (NOEXPAND)
        WHERE part.[ConferenceId] = conf.[Id]
    ) AS CalcConferenceParticipantCount
  FROM
    [Conference] conf
  ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
  rm.[Id]
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

SQL Server agrees with my assessment that it's a good idea and CPU time falls to 78 ms.

On my machine I was able to make the query even faster, but this is starting to get into optimizations that are somewhat risky because it may require a LOOP JOIN hint. That hint may not be a good idea as your query or the data in the table changes. It also may not be a good fit for your hardware. The idea behind this approach is to create a suitable index on [Conference] and to take full advantage of the TOP with a plan that only does nested loops. Here is the index that I added:

CREATE INDEX NCI_Conf ON [Conference] ([RoomId]) INCLUDE ([Name]);

Running the same query as before with a LOOP JOIN hint gave me the following plan:

That query only took 58 ms of CPU time. It's worth mentioning that I noticed requesting the actual plan adds quite a bit of relative overhead at this stage. All of the other possible optimizations that come to mind to me aren't safe for production so I'll stop here.

As a final thought, do you really want to return 1000 arbitrary rows and the minimum conference name? Is that information useful to your end users?

SUGGESTION

Perhaps you should start with a GROUP BY ... HAVING query like this

SELECT table_name,COUNT(1) table_count
FROM information_schema.tables
WHERE table_schema NOT IN
('information_schema','performance_schema','mysql')
GROUP BY table_name HAVING COUNT(1) > 1;

This will definitely give all tables whose name appears in multiple databases

Form that as a subquery and join it to gather all databases the table appears in

SELECT
    A.table_name,
    GROUP_CONCAT(B.table_schema) TheTableAppearsInTheseDatabases
FROM
(
    SELECT table_name,COUNT(1) table_count
    FROM information_schema.tables
    WHERE table_schema NOT IN
    ('information_schema','performance_schema','mysql')
    GROUP BY table_name HAVING COUNT(1) > 1
) A INNER JOIN information_schema.tables B
USING (table_name) GROUP BY A.table_name;

Please notice that I use INNER JOIN rather than LEFT JOIN because it will really form a Cartesian product (2704 X 2704) and then perform comparisons.

I know this works because I tried it out in MySQL 5.5.12 on my Windows7 machine

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 33
Server version: 5.5.12-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>     SELECT
    ->         A.table_name,
    ->         GROUP_CONCAT(B.table_schema) TheTableAppearsInTheseDatabases
    ->     FROM
    ->     (
    ->         SELECT table_name,COUNT(1) table_count
    ->         FROM information_schema.tables
    ->         WHERE table_schema NOT IN
    ->         ('information_schema','performance_schema','mysql')
    ->         GROUP BY table_name HAVING COUNT(1) > 1
    ->     ) A INNER JOIN information_schema.tables B
    ->     USING (table_name) GROUP BY A.table_name;
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
| table_name               | TheTableAppearsInTheseDatabases                                                                                   |
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
| a                        | junk,test,robottinosino                                                                                           |
| acl                      | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blocks                   | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| blog                     | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blog_category            | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| blog_entrycat            | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blog_meta                | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_archive,weisci_jaws_staging2                                     |
| blog_trackback           | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_events          | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| calendar_meta            | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_questions       | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| calendar_tickets         | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_transactions    | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging,weisci_jaws_archive                                     |
| captcha_complex          | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| change_log               | test,junk                                                                                                         |
| chat_staff               | test,junk                                                                                                         |
| comments                 | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging,weisci_jaws_archive                                     |
| donations_charities      | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| donations_charities_meta | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| donations_donations      | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| foo_reference1           | timpost1,timpost2                                                                                                 |
| foo_reference2           | timpost1,timpost2                                                                                                 |
| foo_reference3           | timpost1,timpost2                                                                                                 |
| groups                   | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| ipvisitor                | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| job_post                 | giannosfor,test                                                                                                   |
| layout                   | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| listeners                | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_files       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                                         |
| mediamanager_group       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_photos      | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_video       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| menus                    | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| menus_groups             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| mytable                  | ryanzec,user1267617,cabita,dotancohen,johnlocke,neeraj,test,user391986,cool_cs,javier,mathieu                     |
| mytext                   | jakobud,newstuff                                                                                                  |
| occupation_field         | giannosfor,test                                                                                                   |
| policy_agentblock        | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| policy_ipblock           | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| prova                    | veto,vito                                                                                                         |
| registry                 | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| registry_bk              | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                                         |
| session                  | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| static_pages             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| static_pages_translation | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| t                        | preeti,rollup_test                                                                                                |
| t1                       | abidibo,test                                                                                                      |
| t2                       | test,abidibo                                                                                                      |
| t3                       | test,abidibo                                                                                                      |
| table1                   | table_test,supercoolville,test                                                                                    |
| table2                   | supercoolville,test,table_test                                                                                    |
| tags                     | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tags_content             | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tbl_banner_position      | weisci_jaws_staging2,weisci_jaws_live                                                                             |
| tbl_banner_upload        | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                                         |
| tbl_global_banner        | weisci_jaws_staging2,weisci_jaws_live                                                                             |
| tms_authors              | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tms_repositories         | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tms_themes               | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| updates                  | test,junk                                                                                                         |
| url_aliases              | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| url_maps                 | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| users                    | weisci_jaws_archive,weisci_jaws_staging,giannosfor,veto,weisci_jaws_live,weisci_jaws_staging2,friends,sample,vito |
| users_groups             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| users_meta               | weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live                                     |
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.91 sec)

mysql>

Give it a Try !!!

Sql-server – Efficiently query MAX over multiple ranges

Here's a solution using CROSS APPLY, which does the same TOP query for each customer_id:

SELECT MAX(b.MaxQuantity) AS quantity
  FROM
  (
    SELECT 1 AS customer_id UNION ALL
    SELECT 2
  ) a
  CROSS APPLY
  (
    SELECT TOP 1
      quantity AS MaxQuantity
      FROM orders o
      WHERE o.customer_id = a.customer_id
      ORDER BY quantity DESC
  ) b;

This does the same work as the UNION ALL-based query you wrote in the Fiddle; the difference is that the customer_id input is abstracted from the meat of the query, so it can easily be converted to use a table variable or table-valued parameter, which will result in a static query plan, which is important. This approach will work well for a small number of customer_id values, and simply removing the outer MAX will return the maximum for each customer. I don't believe there's a way to further optimize this query for a small number of customer_ids using these data structures (assuming the customer_ids are random, and not a range).

For a large number of customer_ids, it probably is cheaper to do the index scan and stream aggregate than many seeks. To get this going faster, you'd have to move to some kind of denormalized data structure. MAX isn't supported in an indexed view, so rolling your own mechanism is the only way to go, either in application logic or triggers. Depending on the read/write ratio on this table, that may or may not be faster than the above approach: you'd have to test it in your exact scenario.

Best Answer

Related Solutions

Mysql – Should I use left join to do the job in this scenario

SUGGESTION

Sql-server – Efficiently query MAX over multiple ranges

Related Question