Mysql – Why FORCING INDEX making this Query much slower on smaller datasets

MySQLmysql-5.6performanceperformance-tuningquery-performance

I am using MySQL 5.6 on Ubuntu Server. The SQL involves 2 tables, collection_notes and notes. collection_notes is a relationship table to connect many-to-many relation between collections and notes

The structures're like:

collection_notes
---------
Field         Type      Null  Key Default Extra
id            int(11)   NO    PRI NULL    auto_increment
collection_id int(11)   YES   MUL NULL  
note_id       int(11)   YES   MUL NULL  
created_at    datetime  NO        NULL  
updated_at    datetime  NO        NULL  
user_id       int(11)   YES   MUL NULL  

notes
---------
Field                 Type          Null  Key
id                    int(11)       NO    PRI
created_at            datetime      NO              
updated_at            datetime      NO              
notebook_id           int(11)       YES   MUL 
shared                tinyint(1)    NO    MUL
slug                  varchar(100)  NO    UNI 
shared_at             datetime      YES   MUL 
seq_in_nb             int(11)       YES       
note_type             varchar(255)  YES      
locked                tinyint(1)    NO       
locked_at             datetime      YES       
likes_count           int(11)       YES      
image_file_name       varchar(255)  YES       
generating_image      int(11)       YES      
image_generated_at    int(11)       NO       
commentable           tinyint(1)    YES      
last_stopped_share_at datetime      YES       
bookmarks_count       int(11)       YES      
last_compiled_at      int(11)       NO       
first_shared_at       datetime      YES   MUL 
title_image           varchar(190)  YES       
deleted_at            datetime      YES   MUL 
comment_updated_at    int(11)       YES   MUL

notes has more columns.

Indexes from collection_notes:

Table             Non_unique  Key_name                                            Seq_in_index  Column_name    Collation   Cardinality Sub_part  Packed  Null  Index_type
collection_notes  0           PRIMARY                                             1             id             A           3874847     NULL      NULL           BTREE   
collection_notes  0           index_collection_notes_on_collection_id_and_note_id 1             collection_id  A           138387      NULL      NULL    YES    BTREE   
collection_notes  0           index_collection_notes_on_collection_id_and_note_id 2             note_id        A           3874847     NULL      NULL    YES    BTREE   
collection_notes  1           index_collection_notes_on_collection_id             1             collection_id  A           184516      NULL      NULL    YES    BTREE   
collection_notes  1           index_collection_notes_on_note_id                   1             note_id        A           3874847     NULL      NULL    YES    BTREE   
collection_notes  1           index_collection_notes_on_user_id                   1             user_id        A           82443       NULL      NULL    YES    BTREE   

Indexes from notes:

Table Non_unique  Key_name                                  Seq_in_index  Column_name         Collation Cardinality Sub_part  Packed  Null  Index_type
notes 0           PRIMARY                                   1             id                  A         5434512     NULL      NULL          BTREE   
notes 0           index_notes_on_slug                       1             slug                A         5434512     NULL      NULL          BTREE   
notes 1           index_notes_on_notebook_id                1             notebook_id         A         5434512     NULL      NULL    YES   BTREE   
notes 1           index_notes_on_shared_at                  1             shared_at           A         5434512     NULL      NULL    YES   BTREE   
notes 1           index_notes_on_shared_and_locked          1             shared              A         2           NULL      NULL          BTREE   
notes 1           index_notes_on_shared_and_locked          2             locked              A         6           NULL      NULL          BTREE   
notes 1           index_notes_on_first_shared_at            1             first_shared_at     A         5434512     NULL      NULL    YES   BTREE   
notes 1           index_notes_on_notebook_id_and_seq_in_nb  1             notebook_id         A         5434512     NULL      NULL    YES   BTREE   
notes 1           index_notes_on_notebook_id_and_seq_in_nb  2             seq_in_nb           A         5434512     NULL      NULL    YES   BTREE   
notes 1           index_notes_on_deleted_at                 1             deleted_at          A         452876      NULL      NULL    YES   BTREE   
notes 1           index_notes_on_comment_updated_at         1             comment_updated_at  A         1358628     NULL      NULL    YES   BTREE   

notes table has ~6 million rows, in which ~3 million matchs deleted_at IS NULL and shared = 1 and locked = 0

collection_notes table has ~ 4 million rows.

SQL is:

SELECT `notes`.* FROM `notes` FORCE INDEX(`index_notes_on_comment_updated_at`) 
INNER JOIN `collection_notes` ON `notes`.`id` = `collection_notes`.`note_id` 
WHERE `notes`.`deleted_at` IS NULL 
AND `collection_notes`.`collection_id` = ? 
AND `notes`.`shared` = 1 
AND `notes`.`locked` = 0
ORDER BY `notes`.`comment_updated_at` DESC LIMIT 15 OFFSET 0

When collection_id = 47 which is relatively a very large collection, has more than 54000 notes, and it takes only 30ms, returns 15 rows, the first page.(Without FORCING this index index_notes_on_comment_updated_at, it takes around 1 second, this is why I am trying to optimize it)

When collection_id = 270014 which is a very small collection, only has two notes, and it takes 101 seconds! returns only 2 rows, which are the only two notes this collection has.(Without FORCING this index index_notes_on_comment_updated_at, it takes 30ms, which is very fast)

Why it behaves so differently and it's even slower when the data is small?

Following are the EXPLAINs:

The EXPLAIN without FORCE INDEX with collection_id = 47:

id  select_type table             type    possible_keys                                                                                                                 key                                                 key_len ref                                 rows    Extra
1   SIMPLE      collection_notes  ref     index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 5       const                               256478  Using where; Using index; Using temporary; Using filesort
1   SIMPLE      notes             eq_ref  PRIMARY,index_notes_on_shared_and_locked,index_notes_on_deleted_at                                                            PRIMARY                                             4       maleskine.collection_notes.note_id  1       Using where

The EXPLAIN without FORCE INDEX with collection_id = 270014:

id  select_type table             type    possible_keys                                                                                                                 key                                                 key_len ref                                 rows  Extra
1   SIMPLE      collection_notes  ref     index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 5       const                               8     Using where; Using index; Using temporary; Using filesort
1   SIMPLE      notes             eq_ref  PRIMARY,index_notes_on_shared_and_locked,index_notes_on_deleted_at                                                            PRIMARY                                             4       maleskine.collection_notes.note_id  1     Using where

The EXPLAIN with FORCE INDEX with collection_id = 270014:

id  select_type table             type  possible_keys                                                                                                                 key                                                 key_len ref                       rows  Extra
1   SIMPLE      notes             index NULL                                                                                                                          index_notes_on_comment_updated_at                   5       NULL                      15    Using where
1   SIMPLE      collection_notes  ref   index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 10      const,maleskine.notes.id  1     Using index

The EXPLAIN with FORCE INDEX with collection_id = 47:

id  select_type table             type  possible_keys                                                                                                                 key                                                 key_len ref                       rows  Extra
1   SIMPLE      notes             index NULL                                                                                                                          index_notes_on_comment_updated_at                   5       NULL                      15    Using where
1   SIMPLE      collection_notes  ref   index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 10      const,maleskine.notes.id  1     Using index

My question is:

  1. Why this FORCE INDEX(index_notes_on_comment_updated_at) accelerates query with collection_id = 47(has 50000+ notes) so much(1s -> 30ms) and meantime, makes query with collection_id = 270014(has only 2 notes) so slower(30ms -> 101s)

  2. Without FORCE INDEX(index_notes_on_comment_updated_at), how could I make the original query faster for big collection (like collection_id = 47) now it takes 1s to finish.

Original SQL:

SELECT `notes`.* FROM `notes` 
INNER JOIN `collection_notes` ON `notes`.`id` = `collection_notes`.`note_id` 
WHERE `notes`.`deleted_at` IS NULL 
AND `collection_notes`.`collection_id` = ? 
AND `notes`.`shared` = 1 
AND `notes`.`locked` = 0
ORDER BY `notes`.`comment_updated_at` DESC LIMIT 15 OFFSET 0

Best Answer

In Your case, index

index_notes_on_comment_updated_at, include single column - comment_updated_at

which is used nowhere in Your query in JOIN and/or WHERE part:

SELECT `notes`.* FROM `notes` FORCE INDEX(`index_notes_on_comment_updated_at`) 
INNER JOIN `collection_notes` ON `notes`.`id` = `collection_notes`.`note_id` 
WHERE `notes`.`deleted_at` IS NULL 
AND `collection_notes`.`collection_id` = ? 
AND `notes`.`shared` = 1 
AND `notes`.`locked` = 0
ORDER BY `notes`.`comment_updated_at` DESC LIMIT 15 OFFSET 0

so, force MySQL to use this index - same as FORCE it to use FULL SCAN

'shared' and 'locked' - have very low cardinality,

so not reduce number of rows in this case best choice let mysql use default key for JOIN and filter of rows will be by

`collection_notes`.`collection_id` =

which is included in 2 indexes in Your structure

plus change order in index - index_collection_notes_on_collection_id_and_note_id for

(note_id, collection_id) same as You use in query

Edited after adding all plans

let look for collection_id = 47:

original query:

select 256478 records from collection_notes, than make lookup in notes Using where; Using index; Using temporary; Using filesort Using where

forced query:

filter using FORCED INDEX 15 first records, than make lookup Using where Using index

for second case: with force index it choose index

index_notes_on_comment_updated_at

and lookup for note_id

Add example for Your question - why with similar plan FORCED INDEX query work better for bigger collection: enter image description here

Data sorted by date (yellow column) and with scan mysql must fetch first NNN records rom green columns

I not try explain - how indexes work, but in our case - mysql stop work after first any equal records and number of this first records dramatically less than total number of correct records

this is simple visual example, real data could have different frequency, but logic will be same - with big data-set and small number of equal data, query could (or could not) work faster then on smaller.

There are many other parameters could give affect, but this is one of possible scenarios what happens when index wrong.