Mysql – Best Indexing Strategy for Query with Equality[A], Range[B], Group By[C], AND Order By[count(P)]

indexindex-tuningmariadbMySQLperformancequery-performance

I have a poorly performing query:

SELECT  user_id, count(item_id) as count
FROM table items 
WHERE category = 'magazine'
AND created_at > 1384754400
GROUP BY user_id
ORDER BY count(item_id) desc
LIMIT 100

Whats the optimal indexing strategy in order to optimize this query?

Table Details

500million records with the following structure / cardinalities:

PRIMARY KEY (item_id) – cardinality: 500 M
user_id – cardinality: ~ 25 M
category – cardinality: ~ 2.5 M
created_at – cardinality: ~ 150 M

Indexing:

I have individual indexes on each the user_id, category and created_at fields

I also have the following covering indexes:

(category, user_id) – this is the one the query optimizer defaults to when running explain
(category, created_at)
(category, created_at, user_id) – this is one I attempted to create in order to optimize this query, however, it doesn't seem to be working very well.

Best Answer

If you ONLY want to optimise for this query. This is the best index:

ALTER TABLE items ADD INDEX (category, created_at, user_id)

This optimises the value of the filters, which reduces the total amount of data you touch. By adding user_id, item_id at the end of the query, you make the index covering and it saves you a lookup into the primary index.

We can assume that item_id is NOT NULL (as it is the PRIMARY index).

However, because the MySQL optimiser is pretty stupid, you may need to rewrite like this:

SELECT  user_id, SUM(count) AS count
FROM
(
  SELECT category, created_at, user_id, COUNT(*) as count
  FROM items
  WHERE category = 'magazine'
  AND created_at > 1384754400
  GROUP BY category, created_at, user_id
) AS d
GROUP BY user_id
ORDER BY count DESC
LIMIT 100

QUESTION #1

Since I'll probably move to mysql cluster, does it make sense to switch from InnoDB to NDB engine and use HASH indexes on "category" and "status" columns?

ANSWER TO QUESTION #1

Hash indexes are for one-to-one lookups. Hash indexes are only available for the MEMORY Storage Engine (See my May 17, 2011 post Why does MySQL not have hash indices on MyISAM or InnoDB?)

QUESTION #2

I read how btrees stores data. If most of my queries involve a "WHERE category = x AND status = y", should I add 3 different indexes: one on category, one on status, and one on the combination of both?

ANSWER TO QUESTION #2

You do not want single column indexes. MySQL could still use them if there are no compound index present. It will make MySQL work harder generating results by merging lookups from two separate indexes,

You are way better off with a compound index, an index on both category and status.

In your particular case

the index could be (status,category) if you order categories within a status.
the index could be (category,status) if you order status values under a category (if the number of status values is high)
Since you do an ORDER BY within a (status,category) combination, your compound would benefit even further from one or both of these combined indexes
- (status,category,created_at)
- (category,status,created_at)
Please see my posts on using compound indexes
- Sep 18, 2012 : How are multiple indexes used in a query by MySQL? (Discusses INDEX_MERGE briefly)
- Apr 11, 2014 : Why is MySQL not using the index with the higher cardinality?

QUESTION #3

"show warnings" doesn't show anything useful about what mysql is trying to warn me about: what's wrong with my query?

ANSWER TO QUESTION #3

Without seeing the actual warning message, I can't tell you. Note this: You ran EXPLAIN twice on the same query and got two slightly different results. This was due to the choose of keys. InnoDB tends to take guess by passing through index pages inside the BTREE nodes of the indexes and chooses based on cardinality.

In your case, you should run just once

OPTIMIZE TABLE object;

This will defrag the table and generate a clean set of index statistics.

You could slightly improve the query by writing it as an INNER JOIN

select SQL_NO_CACHE count(*) from object
inner join offer on object.id = offer.id
where object.category = "bid"
and status = "pending"
order by created_at;

You should also think about the fact that offer no PRIMARY KEY.

You should run this query

SELECT COUNT(1),id,category from offer
GROUP BY id,category HAVING COUNT(1) > 1;

If you get no results back, then that can be your PRIMARY KEY. Thus, the offer table should be

CREATE TABLE `offer`
(
    id BIGINT UNSIGNED,
    `category` VARCHAR(31), /* genre, type, category */
    amount DECIMAL(12,3),
    PRIMARY KEY (id,category)
) ENGINE=InnoDB;

Table Details

Best Answer

Related Solutions

Sql-server – Indexing strategy for dynamic predicate

MySQL – Optimizing Indexes for Better Performance