I will answer each the the three questions
QUESTION #1
Since I'll probably move to mysql cluster, does it make sense to switch from InnoDB to NDB engine and use HASH indexes on "category" and "status" columns?
ANSWER TO QUESTION #1
Hash indexes are for one-to-one lookups. Hash indexes are only available for the MEMORY Storage Engine (See my May 17, 2011
post Why does MySQL not have hash indices on MyISAM or InnoDB?)
QUESTION #2
I read how btrees stores data. If most of my queries involve a "WHERE category = x AND status = y", should I add 3 different indexes: one on category, one on status, and one on the combination of both?
ANSWER TO QUESTION #2
You do not want single column indexes. MySQL could still use them if there are no compound index present. It will make MySQL work harder generating results by merging lookups from two separate indexes,
You are way better off with a compound index, an index on both category and status.
In your particular case
- the index could be
(status,category)
if you order categories within a status.
- the index could be
(category,status)
if you order status values under a category (if the number of status values is high)
- Since you do an ORDER BY within a
(status,category)
combination, your compound would benefit even further from one or both of these combined indexes
(status,category,created_at)
(category,status,created_at)
- Please see my posts on using compound indexes
QUESTION #3
"show warnings" doesn't show anything useful about what mysql is trying to warn me about: what's wrong with my query?
ANSWER TO QUESTION #3
Without seeing the actual warning message, I can't tell you. Note this: You ran EXPLAIN twice on the same query and got two slightly different results. This was due to the choose of keys. InnoDB tends to take guess by passing through index pages inside the BTREE nodes of the indexes and chooses based on cardinality.
In your case, you should run just once
OPTIMIZE TABLE object;
This will defrag the table and generate a clean set of index statistics.
You could slightly improve the query by writing it as an INNER JOIN
select SQL_NO_CACHE count(*) from object
inner join offer on object.id = offer.id
where object.category = "bid"
and status = "pending"
order by created_at;
You should also think about the fact that offer
no PRIMARY KEY.
You should run this query
SELECT COUNT(1),id,category from offer
GROUP BY id,category HAVING COUNT(1) > 1;
If you get no results back, then that can be your PRIMARY KEY. Thus, the offer table should be
CREATE TABLE `offer`
(
id BIGINT UNSIGNED,
`category` VARCHAR(31), /* genre, type, category */
amount DECIMAL(12,3),
PRIMARY KEY (id,category)
) ENGINE=InnoDB;
According to this answer:
This is a known problem in MySQL. Rumors say it will be fixed.
The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.
This causes the subquery to be run a huge number of times, which is unnecessary.
Best Answer
To locate the rows you want to see, you must do two things:
Why Change the Way You Create the Table ???
When you did this
You create the
tgt
table without any indexes. You can verify this by runningand you will see
src
with its indexes andtgt
without them.Consequently, with no indexes around, the table load would be the fastest. If you at least had a primary key. you could navigate the table by some numeric
id
column.So, to create table with the indexes in place, do this:
You could then run
and see that they are the same in structure.
Why Change the Way You Load Data into the Table ???
You could just start loading like this
When the load is done, you could run
You could get an glimpse at when the row looks like but the
created_timestamp
may not be populated. If it is populated correctly, then the problem is essentially solved.Another Way to See Which Row Without Loading Data
You could isolate which rows have an issue with the
created_timestamp
by doing something elseThis will show you the rows that would have been inserted.
This is all the help I can suggest since I do not know what the table structures are.