An index can seek by a subset of characters, as long as you're searching from the left. E.g., "Inter%" can seek, "%net" will not.
However, the first character is not necessarily the character under which the article would be sorted. "The Internet" should go under "I", not "T". You probably need two fields, DisplayTitle
and SortTitle
; a single-character index on the latter may be worthwhile, but most likely a full-length index will be just fine.
Indexes are typically B-trees, and a seek will jump to the right location about equally quickly whether you have 10 or 100 entries per page. Scans are another matter, but I'd start with the simplest solution and add an extra index only if performance proves inadequate in practice.
Not a DBA or MySQL expert here, but let's try :).
So let's take your second query - a bit smaller than the 1st one - and simplify the table names.
We have something like : (LO = logs, CL = cloud_logs, CAV = client_application_versions, CA = client_applications)
SELECT LO.* FROM LO
INNER JOIN CL ON CL.id = LO.cloud_log_id
INNER JOIN CAV ON CAV.id = CL.client_application_version_id
INNER JOIN CA ON CA.id = CAV.client_application_id
WHERE (LO.deleted_at IS NULL)
AND (CA.account_id = '3')
AND (CA.id = '5')
ORDER BY timestamp DESC LIMIT 100 OFFSET 0
And so you say it takes about 100 seconds, correct ?
When you say :
I have indexes on all applicable fields already.
Yet I believe that's where the flaw is. You don't have that much joins, and you may have 7 billion data or just 700, that should be performing well if indexing is correctly thought, and I think that's probably the order by / limit that is messing with your performance because of poor indexing.
1/ Have you tried :
SELECT LO.* FROM LO WHERE (LO.deleted_at IS NULL)
or
SELECT * FROM CA WHERE (CA.account_id = '3') AND (CA.id = '5')
See how these requests perform in time, if everything ok with these 2 tables ?
2/ Have you indexed timestamp as well ? Indexing the column you are making the "order by" on is crucial as well.
In fact, you should even think about your data and how many values you're gonna have for each of the data you're querying on. This is very well explained right there : http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/ and will certainly help you.
3/ From what I've read on MySQL a few mins ago you could also try a MySQLCheck see if everything ok with your tables if you think your indexing is OK http://dev.mysql.com/doc/refman/5.0/en/mysqlcheck.html . I know that in older versions of oracle we had to compute stats after creating indexes, maybe something similar here ?
Hope this helps.
[EDIT : 12/01/13 After comments ]
Ok, glad to see you already divided the time by 4 but indeed 25s is way too long.
1/ Have you tried to play with indexes by creating one that would make sense, like explained by Peter here (http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/) ? Like an index on (CA.account_id, CA.id, timestamp) etc ?
2/ How long does it take when you get rid of the order by / limit like below ?
SELECT LO.* FROM LO
INNER JOIN CL ON CL.id = LO.cloud_log_id
INNER JOIN CAV ON CAV.id = CL.client_application_version_id
INNER JOIN CA ON CA.id = CAV.client_application_id
WHERE (LO.deleted_at IS NULL)
AND (CA.account_id = '3')
AND (CA.id = '5')
To check if this would be the order by/limit that mess up with your performance ?
3/ In case 2 is verified, you could try something like :
SELECT LO.* FROM LO
INNER JOIN CL ON CL.id = LO.cloud_log_id
INNER JOIN CAV ON CAV.id = CL.client_application_version_id
INNER JOIN CA ON CA.id = CAV.client_application_id
INNER JOIN
(
SELECT LO.id FROM LO
INNER JOIN CL ON CL.id = LO.cloud_log_id
INNER JOIN CAV ON CAV.id = CL.client_application_version_id
INNER JOIN CA ON CA.id = CAV.client_application_id
WHERE (LO.deleted_at IS NULL)
AND (CA.account_id = '3')
AND (CA.id = '5')
ORDER BY timestamp DESC LIMIT 0,100
) AS PERF ON PERF.id = LO.id
Where you replace LO.id by the column that makes sense with Logs (I suppose you have some sort of Logs id . This is based on : http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
Note you can change the LIMIT 0,100 and keep the OFFSET keyword instead in case you need it (if PostgreSQL compatibility is required).
Best Answer
There are several approaches that can help.
If you just throw in more indexes, then you are likely to speed up most simple queries, as well as introduce unused indexes. Unused indexes don't really hurt if the database is read-mostly, but they take up space. Unless you care about disk space, adding single-column indexes is the fastest way to speed up table reads but also slow down table writes. Knowing the architecture, however, does help avoiding useless indexes.
You can enable logging slow queries or queries that don't use indexes:
Wait a little for the log to fill up, then query the log with
This should point to potential new indexes. Note that every language feature that is available for normal queries is also available here, so use what you need. Once you create a new index, you can reset and repopulate the log:
TRUNCATE mysql.slow_log;
Also read up [here] for the details of slow query logging.