Mysql – How to profile a particularly slow or inefficient query

MySQLoptimizationprofiler

I'm currently writing queries for my database but this is the first time I've dealt with large data sets of any sort, and I'm now finding that I need to improve the performance of my queries.

Scenario

I have a series of tables for purchases and customers as described below

users (user_id, forename, surname) 20k+ records

customers (customer_id, user_id, etc etc) 20k+ records

purchases (purchase_id, item_id, customer_id, price, timestamp, store_id) 150k+ records

deliveries (purchase_id, delivered_time etc) 150k+ records

store (store_id, name etc) <50 records

The query I am currently trying to perform should retrieve all customer names, the date of their last order, when that order was delivered and their total number of orders for all customers at a particular store whos most recent order was in the last 6 months. Here is what I'm currently using.

SELECT `customers`.`customer_id`, `forename`, `surname`, `delivered_time`, MAX(`purchases`.`timestamp`) as lastPurchase, COUNT(purchases.purchase_id) as totalPurchases FROM (`purchases`) 
JOIN `deliveries` ON `purchases`.`purchase_id` = `deliveries`.`purchase_id` 
JOIN `customers` ON `customers`.`customer_id` = `purchases`.`customer_id` 
JOIN `users` ON `users`.`user_id` = `customers`.`user_id` 
WHERE `store_id` = '1' GROUP BY `customer_id` HAVING lastPurchase >= '1372086055'

However, this is pretty slow and takes 1-2 seconds each time, and I can only assume this time would increase as the amount of data increases.

In an attempt to profile the query I have used EXPLAIN which seems to indicate it is searching through around 20k records in the deliveries table. What should my next steps be to optimize this query? How can I go about reducing the execution time?

Best Answer

You didn't provide CREATE TABLE scripts; thus, it's hard to give you a specific advice; I'll try to describe general approach....

If you want to speed up this query, first of all, make sure you have indexes on the columns that define join condition, used as a filter (WHERE), or column[s] you have in GROUP BY or ORDER BY. In general, a good sign that some indexes are missed is when EXPLAIN shows "NULL" in key,key_len and "ALL" in type column (not always though; for instance if you select all rows from the table, or table is small enough, so full scan is faster than index seek + lookup). Then you may want to tweak some indexes to make them covering for this query.

Side note. The query seems to me like a BI query which normally is not executed against OLTP database. When dealing with a big data, it makes sense to build a few cubes based on data from operational db and query them instead of original data. The nature of OLTP implies high level of normalization and optimization for INSERT/UPDATE/DELETE, not for SELECTs (still possible, but queries can be very long, not clear, and quite slow).