Mysql – Optimizing indexes

indexindex-tuningMySQLrdbms

I just began optimizing my schemas with indexes (pretty late I guess).
I never dealt with huge dataset but I'll be soon coping with about 1 to 5 millions rows.

Here is the schema I'm using right now for testing:

CREATE TABLE `object`
(
  `otype` VARCHAR(255),
  `id` BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY NOT NULL,
  `domain` VARCHAR(255),
  `created_at` INT,
  `updated_at` INT,
  `deleted_at` INT,
  `category` VARCHAR(31), /* genre, type, category */
  `status` VARCHAR(15)
)ENGINE=InnoDB;
ALTER TABLE `object` ADD UNIQUE KEY `object` ( `otype`, `id` );
CREATE INDEX `id` ON `object` ( `otype`, `id` );
CREATE INDEX `object_id` ON `object` ( `id` );
CREATE INDEX `object_domain` ON `object` ( `domain` );
CREATE INDEX `object_category` ON `object` ( `category` );
CREATE INDEX `object_status` ON `object` ( `status` );

CREATE TABLE `offer`
(
id BIGINT UNSIGNED,
`category` VARCHAR(31), /* genre, type, category */
amount DECIMAL(12,3)
)ENGINE=InnoDB;
ALTER TABLE `offer` ADD FOREIGN KEY ( `id` ) REFERENCES object( `id` ) ON UPDATE CASCADE ON DELETE CASCADE;
CREATE INDEX `offer_id` ON `offer` ( `id` );
CREATE INDEX `offer_category` ON `offer` ( `category` );

And this is the query I'm trying to optimize: I'm reporting it twice because I was experimenting if using indexes filtering just the first table was faster than filtering both the tables and I guess I was right: aren't I?

mysql> select SQL_NO_CACHE count(*) from object, offer where object.id = offer.id and object.category = "bid" and status = "pending" order by created_at;
+----------+
| count(*) |
+----------+
|   499510 |
+----------+
1 row in set (2.06 sec)

mysql> explain extended select SQL_NO_CACHE count(*) from object, offer where object.id = offer.id and object.category = "bid" and status = "pending" order by created_at;
+----+-------------+--------+------+-------------------------------------------------+-----------------+---------+------------------+--------+----------+------------------------------------+
| id | select_type | table  | type | possible_keys                                   | key             | key_len | ref              | rows   | filtered | Extra                              |
+----+-------------+--------+------+-------------------------------------------------+-----------------+---------+------------------+--------+----------+------------------------------------+
|  1 | SIMPLE      | object | ref  | PRIMARY,object_id,object_category,object_status | object_category | 96      | const            | 498729 |   100.00 | Using index condition; Using where |
|  1 | SIMPLE      | offer  | ref  | offer_id                                        | offer_id        | 9       | testme.object.id |      1 |   100.00 | Using index                        |
+----+-------------+--------+------+-------------------------------------------------+-----------------+---------+------------------+--------+----------+------------------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> select SQL_NO_CACHE count(*) from object, offer where object.id = offer.id and offer.category = "bid" and status = "pending" order by created_at;
+----------+
| count(*) |
+----------+
|   499510 |
+----------+
1 row in set (3.78 sec)

mysql> explain extended select SQL_NO_CACHE count(*) from object, offer where object.id = offer.id and offer.category = "bid" and status = "pending" order by created_at;
+----+-------------+--------+--------+---------------------------------+----------------+---------+-----------------+--------+----------+------------------------------------+
| id | select_type | table  | type   | possible_keys                   | key            | key_len | ref             | rows   | filtered | Extra                              |
+----+-------------+--------+--------+---------------------------------+----------------+---------+-----------------+--------+----------+------------------------------------+
|  1 | SIMPLE      | offer  | ref    | offer_id,offer_category         | offer_category | 96      | const           | 498792 |   100.00 | Using index condition; Using where |
|  1 | SIMPLE      | object | eq_ref | PRIMARY,object_id,object_status | PRIMARY        | 8       | testme.offer.id |      1 |   100.00 | Using where                        |
+----+-------------+--------+--------+---------------------------------+----------------+---------+-----------------+--------+----------+------------------------------------+
2 rows in set, 1 warning (0.00 sec)

I studied different articles and the mysql index page which states:

They are used only for equality comparisons that use the = or <=> operators (but are very fast) […] The optimizer cannot use a hash index to speed up ORDER BY operations

  1. Since I'll probably move to mysql cluster, does it make sense to switch from InnoDB to NDB engine and use HASH indexes on "category" and "status" columns?

  2. I read how btrees stores data. If most of my queries involve a "WHERE category = x AND status = y", should I add 3 different indexes: one on category, one on status, and one on the combination of both?

  3. "show warnings" doesn't show anything useful about what mysql is trying to warn me about: what's wrong with my query?

    mysql> show warnings \G
    *********************** 1. row ***********************
    Level: Note
    Code: 1003
    Message: /* select#1 */ select sql_no_cache count(0) AS count(*) from testme.object join testme.offer where ((testme.object.id = testme.offer.id) and (testme.object.category = 'bid') and (testme.object.status = 'pending'))
    1 row in set (0.00 sec)

4 Can the following:

CREATE INDEX `object_category_status` ON `object` ( `category`, `status` );
CREATE INDEX `object_category` ON `object` ( `category` );
CREATE INDEX `object_status` ON `object` ( `status` );

be reduced with the following for the very same result?

CREATE INDEX `object_category_status` ON `object` ( `category`, `status` );
CREATE INDEX `object_status` ON `object` ( `status` );

Thanks for your time!

Best Answer

I will answer each the the three questions

QUESTION #1

Since I'll probably move to mysql cluster, does it make sense to switch from InnoDB to NDB engine and use HASH indexes on "category" and "status" columns?

ANSWER TO QUESTION #1

Hash indexes are for one-to-one lookups. Hash indexes are only available for the MEMORY Storage Engine (See my May 17, 2011 post Why does MySQL not have hash indices on MyISAM or InnoDB?)

QUESTION #2

I read how btrees stores data. If most of my queries involve a "WHERE category = x AND status = y", should I add 3 different indexes: one on category, one on status, and one on the combination of both?

ANSWER TO QUESTION #2

You do not want single column indexes. MySQL could still use them if there are no compound index present. It will make MySQL work harder generating results by merging lookups from two separate indexes,

You are way better off with a compound index, an index on both category and status.

In your particular case

  • the index could be (status,category) if you order categories within a status.
  • the index could be (category,status) if you order status values under a category (if the number of status values is high)
  • Since you do an ORDER BY within a (status,category) combination, your compound would benefit even further from one or both of these combined indexes
    • (status,category,created_at)
    • (category,status,created_at)
  • Please see my posts on using compound indexes

QUESTION #3

"show warnings" doesn't show anything useful about what mysql is trying to warn me about: what's wrong with my query?

ANSWER TO QUESTION #3

Without seeing the actual warning message, I can't tell you. Note this: You ran EXPLAIN twice on the same query and got two slightly different results. This was due to the choose of keys. InnoDB tends to take guess by passing through index pages inside the BTREE nodes of the indexes and chooses based on cardinality.

In your case, you should run just once

OPTIMIZE TABLE object;

This will defrag the table and generate a clean set of index statistics.

You could slightly improve the query by writing it as an INNER JOIN

select SQL_NO_CACHE count(*) from object
inner join offer on object.id = offer.id
where object.category = "bid"
and status = "pending"
order by created_at;

You should also think about the fact that offer no PRIMARY KEY.

You should run this query

SELECT COUNT(1),id,category from offer
GROUP BY id,category HAVING COUNT(1) > 1;

If you get no results back, then that can be your PRIMARY KEY. Thus, the offer table should be

CREATE TABLE `offer`
(
    id BIGINT UNSIGNED,
    `category` VARCHAR(31), /* genre, type, category */
    amount DECIMAL(12,3),
    PRIMARY KEY (id,category)
) ENGINE=InnoDB;