I am using MySQL 5.6 on Ubuntu Server. The SQL involves 2 tables, collection_notes
and notes
. collection_notes
is a relationship table to connect many-to-many relation between collections
and notes
The structures're like:
collection_notes
---------
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
collection_id int(11) YES MUL NULL
note_id int(11) YES MUL NULL
created_at datetime NO NULL
updated_at datetime NO NULL
user_id int(11) YES MUL NULL
notes
---------
Field Type Null Key
id int(11) NO PRI
created_at datetime NO
updated_at datetime NO
notebook_id int(11) YES MUL
shared tinyint(1) NO MUL
slug varchar(100) NO UNI
shared_at datetime YES MUL
seq_in_nb int(11) YES
note_type varchar(255) YES
locked tinyint(1) NO
locked_at datetime YES
likes_count int(11) YES
image_file_name varchar(255) YES
generating_image int(11) YES
image_generated_at int(11) NO
commentable tinyint(1) YES
last_stopped_share_at datetime YES
bookmarks_count int(11) YES
last_compiled_at int(11) NO
first_shared_at datetime YES MUL
title_image varchar(190) YES
deleted_at datetime YES MUL
comment_updated_at int(11) YES MUL
notes
has more columns.
Indexes from collection_notes
:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type
collection_notes 0 PRIMARY 1 id A 3874847 NULL NULL BTREE
collection_notes 0 index_collection_notes_on_collection_id_and_note_id 1 collection_id A 138387 NULL NULL YES BTREE
collection_notes 0 index_collection_notes_on_collection_id_and_note_id 2 note_id A 3874847 NULL NULL YES BTREE
collection_notes 1 index_collection_notes_on_collection_id 1 collection_id A 184516 NULL NULL YES BTREE
collection_notes 1 index_collection_notes_on_note_id 1 note_id A 3874847 NULL NULL YES BTREE
collection_notes 1 index_collection_notes_on_user_id 1 user_id A 82443 NULL NULL YES BTREE
Indexes from notes
:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type
notes 0 PRIMARY 1 id A 5434512 NULL NULL BTREE
notes 0 index_notes_on_slug 1 slug A 5434512 NULL NULL BTREE
notes 1 index_notes_on_notebook_id 1 notebook_id A 5434512 NULL NULL YES BTREE
notes 1 index_notes_on_shared_at 1 shared_at A 5434512 NULL NULL YES BTREE
notes 1 index_notes_on_shared_and_locked 1 shared A 2 NULL NULL BTREE
notes 1 index_notes_on_shared_and_locked 2 locked A 6 NULL NULL BTREE
notes 1 index_notes_on_first_shared_at 1 first_shared_at A 5434512 NULL NULL YES BTREE
notes 1 index_notes_on_notebook_id_and_seq_in_nb 1 notebook_id A 5434512 NULL NULL YES BTREE
notes 1 index_notes_on_notebook_id_and_seq_in_nb 2 seq_in_nb A 5434512 NULL NULL YES BTREE
notes 1 index_notes_on_deleted_at 1 deleted_at A 452876 NULL NULL YES BTREE
notes 1 index_notes_on_comment_updated_at 1 comment_updated_at A 1358628 NULL NULL YES BTREE
notes
table has ~6 million rows, in which ~3 million matchs deleted_at IS NULL and shared = 1 and locked = 0
collection_notes
table has ~ 4 million rows.
SQL is:
SELECT `notes`.* FROM `notes` FORCE INDEX(`index_notes_on_comment_updated_at`)
INNER JOIN `collection_notes` ON `notes`.`id` = `collection_notes`.`note_id`
WHERE `notes`.`deleted_at` IS NULL
AND `collection_notes`.`collection_id` = ?
AND `notes`.`shared` = 1
AND `notes`.`locked` = 0
ORDER BY `notes`.`comment_updated_at` DESC LIMIT 15 OFFSET 0
When collection_id = 47
which is relatively a very large collection, has more than 54000 notes, and it takes only 30ms, returns 15 rows, the first page.(Without FORCING this index index_notes_on_comment_updated_at
, it takes around 1 second, this is why I am trying to optimize it)
When collection_id = 270014
which is a very small collection, only has two notes, and it takes 101 seconds! returns only 2 rows, which are the only two notes this collection has.(Without FORCING this index index_notes_on_comment_updated_at
, it takes 30ms, which is very fast)
Why it behaves so differently and it's even slower when the data is small?
Following are the EXPLAINs:
The EXPLAIN without FORCE INDEX with collection_id = 47
:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE collection_notes ref index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 5 const 256478 Using where; Using index; Using temporary; Using filesort
1 SIMPLE notes eq_ref PRIMARY,index_notes_on_shared_and_locked,index_notes_on_deleted_at PRIMARY 4 maleskine.collection_notes.note_id 1 Using where
The EXPLAIN without FORCE INDEX with collection_id = 270014
:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE collection_notes ref index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 5 const 8 Using where; Using index; Using temporary; Using filesort
1 SIMPLE notes eq_ref PRIMARY,index_notes_on_shared_and_locked,index_notes_on_deleted_at PRIMARY 4 maleskine.collection_notes.note_id 1 Using where
The EXPLAIN with FORCE INDEX with collection_id = 270014
:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE notes index NULL index_notes_on_comment_updated_at 5 NULL 15 Using where
1 SIMPLE collection_notes ref index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 10 const,maleskine.notes.id 1 Using index
The EXPLAIN with FORCE INDEX with collection_id = 47
:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE notes index NULL index_notes_on_comment_updated_at 5 NULL 15 Using where
1 SIMPLE collection_notes ref index_collection_notes_on_collection_id_and_note_id,index_collection_notes_on_collection_id,index_collection_notes_on_note_id index_collection_notes_on_collection_id_and_note_id 10 const,maleskine.notes.id 1 Using index
My question is:
-
Why this
FORCE INDEX(
index_notes_on_comment_updated_at)
accelerates query withcollection_id = 47
(has 50000+ notes) so much(1s -> 30ms) and meantime, makes query withcollection_id = 270014
(has only 2 notes) so slower(30ms -> 101s) -
Without
FORCE INDEX(
index_notes_on_comment_updated_at)
, how could I make the original query faster for big collection (like collection_id = 47) now it takes 1s to finish.
Original SQL:
SELECT `notes`.* FROM `notes`
INNER JOIN `collection_notes` ON `notes`.`id` = `collection_notes`.`note_id`
WHERE `notes`.`deleted_at` IS NULL
AND `collection_notes`.`collection_id` = ?
AND `notes`.`shared` = 1
AND `notes`.`locked` = 0
ORDER BY `notes`.`comment_updated_at` DESC LIMIT 15 OFFSET 0
Best Answer
In Your case, index
which is used nowhere in Your query in JOIN and/or WHERE part:
so, force MySQL to use this index - same as FORCE it to use FULL SCAN
so not reduce number of rows in this case best choice let mysql use default key for JOIN and filter of rows will be by
which is included in 2 indexes in Your structure
plus change order in index - index_collection_notes_on_collection_id_and_note_id for
Edited after adding all plans
let look for collection_id = 47:
original query:
forced query:
for second case: with force index it choose index
and lookup for note_id
Add example for Your question - why with similar plan FORCED INDEX query work better for bigger collection:
Data sorted by date (yellow column) and with scan mysql must fetch first NNN records rom green columns
I not try explain - how indexes work, but in our case - mysql stop work after first any equal records and number of this first records dramatically less than total number of correct records
this is simple visual example, real data could have different frequency, but logic will be same - with big data-set and small number of equal data, query could (or could not) work faster then on smaller.
There are many other parameters could give affect, but this is one of possible scenarios what happens when index wrong.