I'm currently writing queries for my database but this is the first time I've dealt with large data sets of any sort, and I'm now finding that I need to improve the performance of my queries.
Scenario
I have a series of tables for purchases and customers as described below
users (user_id, forename, surname) 20k+ records
customers (customer_id, user_id, etc etc) 20k+ records
purchases (purchase_id, item_id, customer_id, price, timestamp, store_id) 150k+ records
deliveries (purchase_id, delivered_time etc) 150k+ records
store (store_id, name etc) <50 records
The query I am currently trying to perform should retrieve all customer names, the date of their last order, when that order was delivered and their total number of orders for all customers at a particular store whos most recent order was in the last 6 months. Here is what I'm currently using.
SELECT `customers`.`customer_id`, `forename`, `surname`, `delivered_time`, MAX(`purchases`.`timestamp`) as lastPurchase, COUNT(purchases.purchase_id) as totalPurchases FROM (`purchases`)
JOIN `deliveries` ON `purchases`.`purchase_id` = `deliveries`.`purchase_id`
JOIN `customers` ON `customers`.`customer_id` = `purchases`.`customer_id`
JOIN `users` ON `users`.`user_id` = `customers`.`user_id`
WHERE `store_id` = '1' GROUP BY `customer_id` HAVING lastPurchase >= '1372086055'
However, this is pretty slow and takes 1-2 seconds each time, and I can only assume this time would increase as the amount of data increases.
In an attempt to profile the query I have used EXPLAIN which seems to indicate it is searching through around 20k records in the deliveries table. What should my next steps be to optimize this query? How can I go about reducing the execution time?
Best Answer
You didn't provide
CREATE TABLE
scripts; thus, it's hard to give you a specific advice; I'll try to describe general approach....If you want to speed up this query, first of all, make sure you have indexes on the columns that define join condition, used as a filter (
WHERE
), or column[s] you have inGROUP BY
orORDER BY
. In general, a good sign that some indexes are missed is whenEXPLAIN
shows "NULL" inkey,key_len
and "ALL" intype
column (not always though; for instance if you select all rows from the table, or table is small enough, so full scan is faster than index seek + lookup). Then you may want to tweak some indexes to make them covering for this query.Side note. The query seems to me like a BI query which normally is not executed against OLTP database. When dealing with a big data, it makes sense to build a few cubes based on data from operational db and query them instead of original data. The nature of OLTP implies high level of normalization and optimization for INSERT/UPDATE/DELETE, not for SELECTs (still possible, but queries can be very long, not clear, and quite slow).