If I were you I would run a trace specific to hits on that table. It shouldn't be overly intensive since you are restricting it to just queries against that table, from your application. Run just the minimum needed by the DTA (Database Tuning Advisor). Run it for a day here, a day there, make sure you get some end of week days and some end of month days. Then run the whole lot through the DTA.
Here is why, I'm willing to bet that you have specific combinations of columns that are going to come up more often than not. You can create more complex indexes based on that information. You might also find that you can create some correlated statistics. Basically statistics that have more than one column. For example creating a statistic on City and State together may improve queries against those two columns.
However make sure you don't create to many indexes. On a table that large I'm guessing you do a fair number of writes and every additional index added will slow them down. Of course you may do most of your writes during a batch process.
Also make sure that you put an automatic process to update your statistics periodically. With that many rows the statistics aren't going to update on their own very often. Only once 500+20% of the rows have changed, in at 500 mil rows that's a LOT.
I will answer each the the three questions
QUESTION #1
Since I'll probably move to mysql cluster, does it make sense to switch from InnoDB to NDB engine and use HASH indexes on "category" and "status" columns?
ANSWER TO QUESTION #1
Hash indexes are for one-to-one lookups. Hash indexes are only available for the MEMORY Storage Engine (See my May 17, 2011
post Why does MySQL not have hash indices on MyISAM or InnoDB?)
QUESTION #2
I read how btrees stores data. If most of my queries involve a "WHERE category = x AND status = y", should I add 3 different indexes: one on category, one on status, and one on the combination of both?
ANSWER TO QUESTION #2
You do not want single column indexes. MySQL could still use them if there are no compound index present. It will make MySQL work harder generating results by merging lookups from two separate indexes,
You are way better off with a compound index, an index on both category and status.
In your particular case
- the index could be
(status,category)
if you order categories within a status.
- the index could be
(category,status)
if you order status values under a category (if the number of status values is high)
- Since you do an ORDER BY within a
(status,category)
combination, your compound would benefit even further from one or both of these combined indexes
(status,category,created_at)
(category,status,created_at)
- Please see my posts on using compound indexes
QUESTION #3
"show warnings" doesn't show anything useful about what mysql is trying to warn me about: what's wrong with my query?
ANSWER TO QUESTION #3
Without seeing the actual warning message, I can't tell you. Note this: You ran EXPLAIN twice on the same query and got two slightly different results. This was due to the choose of keys. InnoDB tends to take guess by passing through index pages inside the BTREE nodes of the indexes and chooses based on cardinality.
In your case, you should run just once
OPTIMIZE TABLE object;
This will defrag the table and generate a clean set of index statistics.
You could slightly improve the query by writing it as an INNER JOIN
select SQL_NO_CACHE count(*) from object
inner join offer on object.id = offer.id
where object.category = "bid"
and status = "pending"
order by created_at;
You should also think about the fact that offer
no PRIMARY KEY.
You should run this query
SELECT COUNT(1),id,category from offer
GROUP BY id,category HAVING COUNT(1) > 1;
If you get no results back, then that can be your PRIMARY KEY. Thus, the offer table should be
CREATE TABLE `offer`
(
id BIGINT UNSIGNED,
`category` VARCHAR(31), /* genre, type, category */
amount DECIMAL(12,3),
PRIMARY KEY (id,category)
) ENGINE=InnoDB;
Best Answer
If you ONLY want to optimise for this query. This is the best index:
This optimises the value of the filters, which reduces the total amount of data you touch. By adding
user_id
,item_id
at the end of the query, you make the index covering and it saves you a lookup into the primary index.We can assume that
item_id
is NOT NULL (as it is the PRIMARY index).However, because the MySQL optimiser is pretty stupid, you may need to rewrite like this: