Sql-server – Querying a table with 200+ million rows by filtering on the clustered index AND additional column is super slow

execution-planquery-performancesql serversql-server-2019

We have one table that has clustered index on two columns: first one is NVARCHAR(50) and second one is on DATETIME2(3) type.

There are few other columns. When we query the table with WHERE clause on the clustered index columns, the result is received immediately, even when the result set is tens of thousands of rows.

However, if we add one more predicate on the where clause, the query is super slow. After ten seconds I just stop it, because that is not useful to us.

The table has 200+ million rows.

This is the query that is super fast:

SELECT *
FROM Table
WHERE Col1 = 'blah' AND Col2 BETWEEN 'date1' AND 'date2'

This is the query that is super slow:

SELECT *
FROM Table
WHERE Col1 = 'blah' AND Col2 BETWEEN 'date1' AND 'date2' AND BooleanColumn = 1

In my mind, the second query should use the clustered index to search for rows and then simply scan the result set to filter out what's needed.

Is it possible to (somehow) make the second query work fast without creating additional non clustered index that would include other columns that we need to filter on?

Here is the query plan for the second:

https://www.brentozar.com/pastetheplan/?id=BJj28mjvL

After the first run, it works fast now for different combination of values for the WHERE part, which means probably it's not that results are cached, but the query plan is generated and optimized.

Best Answer

Every time you add a new column predicate, the optimizer builds (sampled) statistics on that column to compute selectivity. This is where the time goes - building the statistics. On a large table, that can take a while, even with a low sampling rate.

Fast query

explain (analyze, buffers)
SELECT *
FROM big_table

where small_table_id = 822573
ORDER BY sort_key_1, sort_key_2, sort_key_3


Sort  (cost=24642.79..24686.64 rows=17540 width=113) (actual time=0.983..1.471 rows=227 loops=1)
  Sort Key: sort_key_1, sort_key_2, transmitter_id
  Sort Method: quicksort  Memory: 56kB
  Buffers: shared hit=1 read=8
  ->  Index Scan using big_table_small_table_index on big_table  (cost=0.56..23406.36 rows=17540 width=113) (actual time=0.055..0.514 rows=227 loops=1)
        Index Cond: (small_table_id = 822573)
        Buffers: shared hit=1 read=8
Total runtime: 1.941 ms

Slow query

Note: only change is the addition of a LIMIT clause.

explain (analyze, buffers)
SELECT *
FROM big_table

where small_table_id = 822573
ORDER BY sort_key_1, sort_key_2, sort_key_3

LIMIT 1


Limit  (cost=0.56..6061.27 rows=1 width=113) (actual time=206737.547..206737.550 rows=1 loops=1)
  Buffers: shared hit=13259226 read=303563
  ->  Index Scan using pagination_index on big_table  (cost=0.56..106304766.93 rows=17540 width=113) (actual time=206737.541..206737.541 rows=1 loops=1)
        Filter: (small_table_id = 822573)
        Rows Removed by Filter: 15520057
        Buffers: shared hit=13259226 read=303563
Total runtime: 206737.643 ms

Fast query, with limit

As suggested in the linked question, disabling index scans makes the limited query fast:

SET enable_indexscan = OFF;

explain (analyze, buffers)
SELECT *
FROM big_table

where small_table_id = 822573
ORDER BY sort_key_1, sort_key_2, sort_key_3

LIMIT 1

Limit  (cost=62302.77..62302.77 rows=1 width=113) (actual time=0.897..0.899 rows=1 loops=1)
  Buffers: shared hit=9
  ->  Sort  (cost=62302.77..62346.62 rows=17540 width=113) (actual time=0.894..0.894 rows=1 loops=1)
        Sort Key: sort_key_1, sort_key_2, transmitter_id
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=9
        ->  Bitmap Heap Scan on big_table  (cost=388.50..62215.07 rows=17540 width=113) (actual time=0.040..0.477 rows=227 loops=1)
              Recheck Cond: (small_table_id = 822573)
              Buffers: shared hit=9
              ->  Bitmap Index Scan on big_table_small_table_index  (cost=0.00..384.11 rows=17540 width=0) (actual time=0.030..0.030 rows=227 loops=1)
                    Index Cond: (small_table_id = 822573)
                    Buffers: shared hit=4
Total runtime: 0.932 ms

Fast query, with Common Table Expression

The following query is also fast:

SET enable_indexscan = ON;

explain (analyze, buffers)
with t as (
SELECT *
FROM big_table

where small_table_id = 822573
ORDER BY sort_key_1, sort_key_2, sort_key_3
)
select * from t
LIMIT 1

Limit  (cost=24686.64..24686.66 rows=1 width=3172) (actual time=1.011..1.013 rows=1 loops=1)
  Buffers: shared hit=9
  CTE t
    ->  Sort  (cost=24642.79..24686.64 rows=17540 width=113) (actual time=1.001..1.001 rows=1 loops=1)
          Sort Key: big_table.sort_key_1, big_table.sort_key_2, sort_key_3
          Sort Method: quicksort  Memory: 56kB
          Buffers: shared hit=9
          ->  Index Scan using big_table_small_table_index on big_table  (cost=0.56..23406.36 rows=17540 width=113) (actual time=0.025..0.504 rows=227 loops=1)
                Index Cond: (small_table_id = 822573)
                Buffers: shared hit=9
  ->  CTE Scan on t  (cost=0.00..350.80 rows=17540 width=3172) (actual time=1.007..1.007 rows=1 loops=1)
        Buffers: shared hit=9
Total runtime: 1.070 ms

Mariadb – Performance first low, then high after adding index, then keep high after drop index

Most likely the statistics on the field were out of date. Adding the index created/updated statistics with a full table scan. Or creating the index pulled all the data you need into memory, like any scan would.

Adding an index will only perform better if it is being used. Add the index and then check the query plan. If the stats are up to date and the query optimizer elects to use it there is a big chance it will improve performance. Removing statistics has no use as it will result most likely in a suboptimal query plan.

More info on statistics https://mariadb.com/kb/en/mariadb/documentation/optimization-and-tuning/engine-independent-table-statistics/

Best Answer

Related Solutions

Postgresql – Slow postgresql query with pagination and filtering

Fast query

Slow query

Fast query, with limit

Fast query, with Common Table Expression

Mariadb – Performance first low, then high after adding index, then keep high after drop index

Related Question