Is it indeed necessary for all selected columns to be indexed in order for MySQL to choose to use the index?
This is a loaded question because there are factors that determine whether an index is worth using.
FACTOR #1
For any given index, what is the key population? In other words, what is the cardinality (distinct count) of all tuples recorded in the index?
FACTOR #2
What storage engine are you using? Are all needed columns accessible from an index?
WHAT'S NEXT ???
Let's take a simple example: a table that holds two values (Male and Female)
Let create such a table with a test for index usage
USE test
DROP TABLE IF EXISTS mf;
CREATE TABLE mf
(
id int not null auto_increment,
gender char(1),
primary key (id),
key (gender)
) ENGINE=InnODB;
INSERT INTO mf (gender) VALUES
('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
('M'),('M'),('M'),('M'),('F'),('F'),('M'),('M'),
('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
('F'),('M'),('M'),('M'),('M'),('M'),('M'),('M');
ANALYZE TABLE mf;
EXPLAIN SELECT gender FROM mf WHERE gender='F';
EXPLAIN SELECT gender FROM mf WHERE gender='M';
EXPLAIN SELECT id FROM mf WHERE gender='F';
EXPLAIN SELECT id FROM mf WHERE gender='M';
TEST InnoDB
mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS mf;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE mf
-> (
-> id int not null auto_increment,
-> gender char(1),
-> primary key (id),
-> key (gender)
-> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO mf (gender) VALUES
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('F'),('F'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('F'),('M'),('M'),('M'),('M'),('M'),('M'),('M');
Query OK, 40 rows affected (0.06 sec)
Records: 40 Duplicates: 0 Warnings: 0
mysql> ANALYZE TABLE mf;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.mf | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT gender FROM mf WHERE gender='F';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 3 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT gender FROM mf WHERE gender='M';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 37 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM mf WHERE gender='F';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 3 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM mf WHERE gender='M';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 37 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql>
TEST MyISAM
mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS mf;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE mf
-> (
-> id int not null auto_increment,
-> gender char(1),
-> primary key (id),
-> key (gender)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec)
mysql> INSERT INTO mf (gender) VALUES
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('F'),('F'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('M'),('M'),('M'),('M'),('M'),('M'),('M'),('M'),
-> ('F'),('M'),('M'),('M'),('M'),('M'),('M'),('M');
Query OK, 40 rows affected (0.00 sec)
Records: 40 Duplicates: 0 Warnings: 0
mysql> ANALYZE TABLE mf;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.mf | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT gender FROM mf WHERE gender='F';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 3 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT gender FROM mf WHERE gender='M';
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 36 | Using where; Using index |
+----+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM mf WHERE gender='F';
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
| 1 | SIMPLE | mf | ref | gender | gender | 2 | const | 3 | Using where |
+----+-------------+-------+------+---------------+--------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM mf WHERE gender='M';
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | mf | ALL | gender | NULL | NULL | NULL | 40 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql>
Analysis for InnoDB
When the data was loaded as InnoDB, please note that all four EXPLAIN
plans used the gender
index. The third and fourth EXPLAIN
plans used the gender
index even though the requested data was id
. Why? Because id
is in the PRIMARY KEY
and all secondary indexes have reference pointers back to the PRIMARY KEY
(via the gen_clust_index).
Analysis for MyISAM
When the data was loaded as MyISAM, please note that the first three EXPLAIN
plans used the gender
index. In the fourth EXPLAIN
plan, the Query Optimizer decided not to use an index at all. It opted for a full table scan instead. Why?
Regardless of DBMS, Query Optimizers operate on a very simple rule-of-thumb: If an index is being screened as a candidate to be used for performing the lookup and Query Optimizer computes that it must lookup more than 5% of the total number of rows in the table:
- a full index scan is done if all needed columns for retrieval are in the selected index
- a full table scan otherwise
CONCLUSION
If you do not have proper covering indexes, or if the key population for any given tuple is more than 5% of the table, six things must happen:
- Come to the realization that you must profile the queries
- Find all
WHERE
, GROUP BY
, and ORDER BY` clauses from those Queries
- Formulate indexes in this order
WHERE
clause columns with static values
GROUP BY
columns
ORDER BY
columns
- Avoid Full Table Scans (Queries lacking a sensible
WHERE
clause)
- Avoid Bad Key Populations (or at least cache those Bad Key Populations)
- Decide on the best MySQL Storage Engine (InnoDB or MyISAM) for the Tables
I have written about this 5% rule of thumb in the past:
UPDATE 2012-11-14 13:05 EDT
I took a look back at your question and at the original SO post. Then, I thought about my Analysis for InnoDB
I mentioned before. It coincides with the person
table. Why?
For both tables mf
and person
- Storage Engine is InnoDB
- Primary Key is
id
- Table access is by secondary index
- If table was MyISAM, we would see a completely different
EXPLAIN
plan
Now, look at the query from the SO question : select * from person order by age\G
. Since there is no WHERE
clause, you explicitly demanded a full table scan. The default sort order of the table would be by id
(PRIMARY KEY) because of its auto_increment and the gen_clust_index (aka Clustered Index) is ordered by internal rowid. When you ordered by the index, keep in mind that InnoDB secondary indexes have the rowid attached to each index entry. This produces the internal need for full row access each time.
Setting up ORDER BY
on an InnoDB table can be a rather daunting task if you ignore these facts about how InnoDB indexes are organized.
Going back to that SO query, since you explicitly demanded a full table scan, IMHO the MySQL Query Optimizer did the correct thing (or at least, chose the path of least resistance). When it comes to InnoDB and the SO query, it is far easier to perform a full table scan and then some filesort
rather than doing a full index scan and a row lookup via the gen_clust_index for each secondary index entry.
I am not an advocate of using Index Hints because it ignores the EXPLAIN plan. Notwithstanding, if you really know your data better than InnoDB, you will have to resort to Index Hints, especially with queries that have no WHERE
clause.
UPDATE 2012-11-14 14:21 EDT
According to the book Understanding MySQL Internals
Page 202 Paragraph 7 says the following:
The data is stored in a special structure called a clustered index, which is a B-tree with the primary key acting as the key value, and the actual record (rather than a pointer) in the data part. Thus, each InnoDB table must have a primary key. If one is not supplied, a special row ID column not normally visible to the user is added to act as a primary key. A secondary key will store the value of the primary key that identifies the record. The B-tree code can be found in innobase/btr/btr0btr.c.
This is why I stated earlier : it is far easier to perform a full table scan and then some filesort rather than doing a full index scan and a row lookup via the gen_clust_index for each secondary index entry. InnoDB is going to do a double index lookup every time. That sounds kind of brutal, but that's just the facts. Again, take into consideration the lack of WHERE
clause. This, in itself, is the hint to the MySQL Query Optimizer to do a full table scan.
YOUR QUERY
SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;
At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.
REALITY CHECK
No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.
CLUSTERED INDEX
Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.
REDUNDANT INDEXES
You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:
ALTER TABLE newbb_innopost
DROP INDEX threadid,
DROP INDEX threadid_2,
DROP INDEX threadid_visible_dateline,
ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;
Why strip down these indexes ?
- The first three indexes start with threadid
threadid_2
and threadid_visible_dateline
start with the same three columns
threadid_visible_dateline
does not need postid since it's the PRIMARY KEY and it's embedded
BUFFER CACHING
The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.
Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.
TABLE LAYOUT
You could shave of some space from the row by considering importthreadid
and importpostid
. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.
You should run this
SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();
This will recommend what data types these columns should be for the given dataset.
CONCLUSION
MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.
While you revealed the amount of RAM (32GB
) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002
), there are still other pieces to this puzzle you have not revealed
- The InnoDB settings
- The Number of Cores
- Other settings from
my.cnf
If you can add these things to the question, I can further elaborate.
UPDATE 2014-08-28 11:27 EDT
You should increase threading
innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M
I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)
query_cache_size = 0
I would preserve the Buffer Pool
innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1
Increase purge threads (if you do DML on multiple tables)
innodb_purge_threads = 4
GIVE IT A TRY !!!
Best Answer
The quotes define the expression as a string, whereas without the single quote it is evaluated as a number. This means that MySQL is forced to perform a Type Conversion to convert the number to a
CHAR
to do a proper comparison.As the doc above says,
However, the inverse of that is not true and while the index can be used, using a string as a value causes a poor execution plan (as illustrated by jkavalik's sqlfiddle) where
using where
is used instead of the fasterusing index condition
. The main difference between the two is that the former requires a row lookup and the latter can get the data directly from the index.You should definitely modify the column data type (assuming it truly is only meant to contain numbers) to the appropriate data type ASAP, but make sure that no queries are actually using single quotes, otherwise you'll be back where you started.