Mysql – Range Optimizer in Mysql: What is an Index dive during Query Optimization

indexinnodbMySQLoptimizationstatistics

On following Jorgen Loland post on range optimization, I have come to the concept of "index dive" which is made during MySQL query planning (aka in Mysql Optimizer).

Quoting his point,

For as long as there have been a range access method in MySQL, the number of rows in a range has been estimated by diving down the index to find the start and end of the range and use these to count the number of rows between them. This technique is accurate and is, therefore, a good basis to make the best possible execution plan (QEP) for the query.

I have tested the same and they were accurate in my tests too.
I have a few questions,

What does it exactly mean, "to find the start and end of the range"? Does this mean
getting the start page and end page for range in the B-tree.
If my first question is Yes, then how this technique will be accurate,i.e.,
what about the pages.
in the middle of the range.

Best Answer

To "dive" is to drill down the BTree from the 'root' down to the leaf node for a particular key.

Let's say you have a non-unique INDEX(x). Further, let's say that there are 100 rows with x=123. WHERE x = 123, the two dives would be to "find the first index entry for x=123" and to find the "last" such row.

His first use of "accurate" (in the article) was to say that the dives are more accurate than using statistics that may be stale.

I suspect his use of "accurate" is an exaggeration in the other uses. I say this because I cannot believe that counting the rows between the first and last dive is no more than an estimation. To get the exact number would require fetching each block -- this is too costly.

Note the 100 rows might be in the same block, or might (if the rows are wide) be spread across multiple blocks. For the purposes of EXPLAIN and query planning, an estimate is considered "good enough". The exact ("accurate") number can be quickly counted if the start and end are in the same block, but, if spread across multiple blocks, I think then it would only estimate the row counts for the intermediate blocks.

The "dives", even if they involve a disk-read, are usually not wasted. This is because you will (presumably) then proceed to perform the query and need those block(s) anyway.

Side note: Since Jørgen Løland wrote that article, the Optimizer has employed a "cost basis" algorithm. However, the "dives" are still important in it.

Related Solutions

Mysql – How to estimate/predict data size and index size of a table in MySQL

Given the table description, I see

66 bytes per row of data
4 bytes per row for the primary key
7 bytes per row for country code index
- 3 bytes for the country
- 4 bytes for Clustered Key attached to the country code
Total of 77 bytes of data and keys
This does not factoring housekeeping for BTREEs or Tablespace Fragmentation

For a million rows, that would 77,000,000 bytes (73.43 MB)

As for measuring the table, for a given table mydb.mytable, you can run this query

SELECT 
    CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',SUBSTR(units,pw1*2+1,2)) DATSIZE,
    CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',SUBSTR(units,pw2*2+1,2)) NDXSIZE,
    CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',SUBSTR(units,pw3*2+1,2)) TBLSIZE
FROM
(
    SELECT DAT,NDX,TBL,IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (
        SELECT data_length DAT,index_length NDX,data_length+index_length TBL,
        FLOOR(LOG(IF(data_length=0,1,data_length))/LOG(1024)) px,
        FLOOR(LOG(IF(index_length=0,1,index_length))/LOG(1024)) py,
        FLOOR(LOG(data_length+index_length)/LOG(1024)) pz
        FROM information_schema.tables
        WHERE table_schema='mydb'
        AND table_name='mytable'
    ) AA
) A,(SELECT 'B KBMBGBTB' units) B;

To measure all tables grouped by Database and Storage Engine

SELECT
    IF(ISNULL(DB)+ISNULL(ENGINE)=2,'Database Total',
    CONCAT(DB,' ',IFNULL(ENGINE,'Total'))) "Reported Statistic",
    LPAD(CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',
    SUBSTR(units,pw1*2+1,2)),17,' ') "Data Size",
    LPAD(CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',
    SUBSTR(units,pw2*2+1,2)),17,' ') "Index Size",
    LPAD(CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',
    SUBSTR(units,pw3*2+1,2)),17,' ') "Total Size"
FROM
(
    SELECT DB,ENGINE,DAT,NDX,TBL,
    IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (SELECT *,
        FLOOR(LOG(IF(DAT=0,1,DAT))/LOG(1024)) px,
        FLOOR(LOG(IF(NDX=0,1,NDX))/LOG(1024)) py,
        FLOOR(LOG(IF(TBL=0,1,TBL))/LOG(1024)) pz
    FROM
    (SELECT
        DB,ENGINE,
        SUM(data_length) DAT,
        SUM(index_length) NDX,
        SUM(data_length+index_length) TBL
    FROM
    (
       SELECT table_schema DB,ENGINE,data_length,index_length FROM
       information_schema.tables WHERE table_schema NOT IN
       ('information_schema','performance_schema','mysql')
       AND ENGINE IS NOT NULL
    ) AAA GROUP BY DB,ENGINE WITH ROLLUP
) AAA) AA) A,(SELECT ' BKBMBGBTB' units) B;

Run these queries and you can track changes in database/engine disk usage.

Give it a Try !!!

PostgreSQL Performance – Unused Index in Range of Dates Query

Explanation

My question is: why does this not use the index amplifier_saturation_start?

Even with 30,000,000 rows, only 3,500 in the date range it can be faster to read tuples from the top of the index amplifier_saturation_lddate on lddate. The first row that passes the filter on start can be returned as is. No sort step needed. With a perfectly random distribution a little below 9000 index tuples would have to be checked on average.

Using amplifier_saturation_start, Postgres would still have to determinemax(lddate) after fetching all 3500 qualifying rows. Close call. The decision depends on gathered statistics and your cost settings. Depending on data distribution and other details one or the other will be faster, and one or the other will be expected to be faster.

Better index

This can be substantially faster with a multicolumn index on (lddate, start) like you found yourself already. This way Postgres can use an index-only scan and not touch the heap (the table) at all.

But there is another minor thing you can improve. Did you wonder about of this detail in your EXPLAIN output?

Index Cond: ((lddate IS NOT NULL) AND ...

Why does Postgres have to exclude NULL values?
Because NULL sorts after the greatest value in ASCENDING, or before in DESCENDING order. The maximum non-null value which is returned by the aggregate function max() is not at the beginning / end of the index if there are NULL values. Adding NULLS LAST | FIRST adjusts the sort order to the characteristic of max() (and makes the opposite min() more expensive). Since we are mostly interested in the latest timestamp, DESC NULLS LAST is the better choice.

CREATE INDEX tmp_as ON bss.amplifier_saturation (lddate DESC NULLS LAST, start);

Now, your table column lddate obviously doesn't have NULL values, being defined NOT NULL. The effect on performance will be negligible in this particular case. Still worth mentioning for cases that can have NULL.

The other index option would be on (start, lddate), basically a pimped amplifier_saturation_start index, which also would allow index-only scans. Depending on data distribution and actual parameter values in your query one or the other will be faster.

Two notes concerning `timestamp`

Your table columns are timestamptz, but your query predicates use timestamp literals. Postgres derives the time zone from your current timezone setting and adjusts accordingly. This may or may not be as intended. It certainly makes the query volatile - depending on a setting of your session. It would be problematic for a call that could be made from different time zones (with differing sessions settings). Then you'd rather use an explicit offset or the AT TIME ZONE construct to make it stable. Details:

Ignoring timezones altogether in Rails and PostgreSQL

You typically would want to exclude the upper bound for correctness. < instead of <=.

select max(lddate)
from   bss.amplifier_saturation
where  start >= '1987-12-31 00:00:00'::timestamp AT TIME ZONE 'PST'
and    start <  '1988-04-09 00:00:00 PST'::timestamptz; -- shorter

PST (pacific standard time) being a random example time zone.