First, what you need to focus on are the three fields in the query
The index you have (created_at,person_id)
will make the query do an index scan across all the days of created_at
after CURRENT_DATE
looking for the person_id
.
SUGGESTION #1 : You will definitely need a different index
MyISAM
If login_events
is MyISAM, this is the index you need
ALTER TABLE login_events ADD INDEX person_date_ndx (person_id,created_at,id);
This changes the query because the query will look for the specific person_id
and scan all days for person_id
1 only. The reason id
is included in the index ? The query will retrieve the id
from the index only file rather than the table. That way, all 3 fields are retrieved from the index file instead of 2 fields from the index and 1 from the table.
InnoDB
If login_events
is InnoDB, this is the index you need
ALTER TABLE login_events ADD INDEX person_date_ndx (person_id,created_at);
The reason I recommend this is the same, but you do not need to include id
. Why? All index pages include an index point back to the clustered index so retrieval of an index will intrinsically access the row anyway, thus accessing id. Adding id
to the index would simply be redundant.
SUGGESTION #2 : Change the Date Comparison
From the expression
DATE(created_at) >= DATE(CURRENT_DATE)
I can tell that created_at
is either DATETIME
or TIMESTAMP
.
The expression forces the query to convert every row's DATETIME value of created_on
into a DATE.
Therefore, instead of
SELECT login_events.id
FROM login_events
WHERE (
DATE(created_at) >= DATE(CURRENT_DATE)
AND person_id = 1
)
LIMIT 1
express the date comparison as a time comparison starting from midnight of today
SELECT login_events.id
FROM login_events
WHERE (
created_at >= (DATE(NOW()) + INTERVAL 0 SECOND)
AND person_id = 1
)
LIMIT 1
CAVEAT
Since the table is so small, either storage engine would be fine. I would give the edge to MyISAM.
Give it a Try !!!
In case it is better to start with stats
:
stats: INDEX(category, TW_ID) -- "covering"
posts: INDEX(TW_ID, mention_or_post, in_reply_to, created_time)
In case create_time
is a better filter than eng_rate
:
posts: INDEX(mention_or_post, in_reply_to, created_time)
stats: INDEX(TW_ID, category) -- "covering"
For posts
, start with the columns compared by '=' (in any order), then move on to one "range" -- either eng_rate
or created_time
"Flags" (mention_or_post
) are almost never worth indexing by themselves. If it is a flag, use either TINYINT
or ENUM
, not a bulky VARCHAR(7)
.
"To me this tells me that the index 'scraper_index' is so inefficient that it's nearly as slow as not having a index at all." -- Not necessarily. It could be that a bunch of index and/or data blocks were cached, thereby invalidating the conclusion. Run each query twice and take the second timing. (And either have the Query cache turned off, or include SQL_NO_CACHE
.)
(Please provide SHOW CREATE TABLE
, it is more descriptive than DESCRIBE
.)
More on creating indexes.
What the heck is in stats
? "1506 rows, 32Mb data" implies a lot of stuff.
When will you be moving to InnoDB.
Best Answer
You have the best index there is. It is in the right order, and the EXPLAIN says "Using index", which means that it read the index to get the answer, and did not have to reach into the data.
(To further address all the comments...)
Note that it needed to read about 200K rows (of the index) to do the count. That many rows takes time.
INDEX(offer_id, created_at)
versusINDEX(offer_id, created_at, tx_id)
-- Apparently you are using InnoDB and tx_id is the PRIMARY KEY. The PK is included in every secondary key, so these two index specifications are virtually identical.Order of the columns in an INDEX usually matters. And it does matter here. The fields must be in this order: (1) all the the "=" conditions (offer_id), (2) one range (created_id), and (3) all the other fields to make it "Using index", in any order (tx_id).
If you did not have
offer_id = 5
, follow the above pattern and get (1) (empty set), (2) (created_id), and (3) (tx_id) -- That is,INDEX(created_at, tx_id)
. Note that neither index works well for the other query.No kind of PARTITIONing would help performance at all. You don't need a 2-dimensional index (as in two ranges); you have "=" and "range", so a 'compound index' works best.
I suspect that "Using MRR" (Multi-Range Read Optimization) effectively replaces "Using temporary" and "Using filesort" would might normally be used for
DISTINCT
.