Mysql – Reducing execution time for this gnarly SQL

MySQLperformancequery-performance

I'm creating an order system of some kind. There is a function that I use to get orders (in PHP) and retrieve the required information about the order. However, it seems that I am pulling a lot of information, including several left joins that could be the victim of executing for a period of more than 3 seconds.

The amount of rows in each table that is used.

  • Orders – 2,485
  • Order items – 27,321
  • Order payments – 3
  • Customers – 68
  • Customers' addresses – 153 (joins twice for invoice and delivery address)
  • Order statuses – 2,277
  • Status types – 5
  • Recurring orders data – 46

I know that you will need to use EXPLAIN <query> to debug the query. It seems that it does not use any of the indexes that I have placed on the tables (basically there is an index on every column that I am joining, apart from the PRIMARY KEY columns)

This is the debug:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    2302    Using filesort
2   DERIVED     <derived3>  ALL     NULL    NULL    NULL    NULL    2302    Using temporary; Using filesort
3   DERIVED     <derived4>  ALL     NULL    NULL    NULL    NULL    2484    
3   DERIVED     o   eq_ref  PRIMARY     PRIMARY     4   oi.order_id     1   Using where
3   DERIVED     <derived6>  ALL     NULL    NULL    NULL    NULL    3   
3   DERIVED     c   eq_ref  PRIMARY     PRIMARY     4   orders_system.o.customer_id     1   
3   DERIVED     <derived7>  ALL     NULL    NULL    NULL    NULL    153     
3   DERIVED     <derived8>  ALL     NULL    NULL    NULL    NULL    153     
3   DERIVED     <derived9>  ALL     NULL    NULL    NULL    NULL    13  
3   DERIVED     <derived10>     ALL     NULL    NULL    NULL    NULL    2277    
10  DERIVED     os  ALL     NULL    NULL    NULL    NULL    2277    Using temporary; Using filesort
10  DERIVED     st  index   PRIMARY     PRIMARY     1   NULL    5   Using where; Using index; Using join buffer
9   DERIVED     recurring_orders    ALL     NULL    NULL    NULL    NULL    46  Using where; Using filesort
8   DERIVED     customer_addresses  ALL     NULL    NULL    NULL    NULL    153     
7   DERIVED     customer_addresses  ALL     NULL    NULL    NULL    NULL    153     
6   DERIVED     order_payments  ALL     NULL    NULL    NULL    NULL    3   Using where; Using temporary; Using filesort
4   DERIVED     <derived5>  ALL     NULL    NULL    NULL    NULL    2484    Using temporary; Using filesort
5   DERIVED     oi  ALL     order_id    NULL    NULL    NULL    27321   Using where; Using temporary; Using filesort
5   DERIVED     o   eq_ref  PRIMARY     PRIMARY     4   orders_system.oi.order_id   1   

Visual:

enter image description here

I am not sure why the query is executing a full table scan, this is problematic if it is taking that long to run, especially if orders are about to increase daily.

This is the query that is used to execute, it's too big to view here.

What can I do to improve this query?

Best Answer

At the moment, I am using an instance of MySQL 5.6.19 which has the performance_schema enabled on a 2GB x64 Linux laptop (nothing stellar by any means!). My normal response time is approx. 0.5 s - not too bad. What's your setup (RAM, disk config, OS &c.)?

I must say that I'm surprised at 3 seconds on a server. Are you sure that it's not some sort of network issue?

I checked out I/O for the query and noticed something strange - every time the query executes, the order_items table is accessed 27,321 times - which is exactly the same as the number of records. This smells of a correlated subquery - but the SQL is horribly complex and I haven't been able to get to the bottom of it (not enough time). Obviously, you should only want to access the table once (or a few times maybe) at most. Order_status is only read once for example.

It's not really a surprise that your query doesn't use indexes - see this post here which gives some explanation as to why - basically if the optimiser thinks that a FTS (full table scan) is cheaper than using the index, then that's what it will do. This is a function of the size of the table(s) (yours are v. small ===> FTS) and/or percentage of records to be retrieved (yours 100%, again ==> FTS).

I notice that you are using MyISAM tables, for full text indexes? InnoDB now has full text indexes - see here and check the docco. Also (and more importantly), InnoDB allows you to use DRI (declarative referential integrity) which is very important - IMHO, there is no good reason to use MyISAM tables anymore.

Could you explain the business logic behind this query? It strikes me (and this is only an opinion) that it is overly complex for a relatively simple system with 8 tables - has it grown over time and been added to by different people? I am respectfully suggesting that the time may have come for a complete rewrite from scratch rather than tinkering at the edges.

Another problem is that for orders, you have ordered_date as an INT(12) - when I first saw the data, I thought that it was a timestamp. You are creating extra work for yourself if you don't use the correct data type for your columns. Error checking is more difficult.

[EDIT] in response to OP's comments (easy to hard)

1)

INT v TIMESTAMP v DATE/DATETIME

Take a look at these (1, 2, 3, 4) posts. I suppose that INT v TIMESTAMP is relatively trivial (however, see point 4 in the answer here), but INT/TIMESTAMP v DATE/DATETIME isn't. Check out the answer to post 4 - you should be using DATEs/DATETIMEs when you are using INT/TIMESTAMP.

2)

MySQL 5.6.19, but the server is using 5.5.37. Would upgrading improve the performance?

Maybe, maybe not - it's probably trivial in respect to the issues that I'll be going through in point 3 - see InnoDB/Full Text Indexing. If I had to guess, upgrading would decrease performance by no more than a couple of percent (some think that the performance schema adds overhead) and quite possibly be a significant improvement. These things can vary according to many parameters (disk, CPU, RAM, network, server load, phase of the moon...). Overall though, performance should improve release by release although YMMV.

3)

General remarks.

I changed the storage engine for your system from MyISAM to InnoDB. This resulted in the query time being consistently reduced from ~ .45s to ~ .37s (~20%) - a noticeable improvement - maybe you should try it? I can only speak for the testing (not v. thorough) I did on my 2GB x64 Linux laptop. You didn't answer my question about your own server specs. If I'm getting a response time of ~ .4s, I can't for the life of me understand why you're getting 3s? Unless it's (as I suggested) a network issue, or is the server under load when you run the query?

InnoDB now has full text indexing in 5.6 (see my point 2 above) - AFAIK, that will not work in 5.5 (i.e. your table declarations will fail for InnoDB) - another reason to upgrade.

I tried to put in FOREIGN KEY relationships in the tables, but there appears to be problems with your data - I'm getting different counts after reloading the schemas (without FK and with FK). This can mean you have orphaned records, which may or may not affect the result. Anomalies occurred with orders/order_items and customers/customer_addresses. e.g.

mysql> select i.item_id, i.order_id from order_items i where i.order_id not in (SELECT ORDER_ID from orders);
+---------+----------+
| item_id | order_id |
+---------+----------+
|   26892 |     2433 |
+---------+----------+

Worse, none of the order payments appear valid

mysql> select p.order_id from order_payments p where p.order_id not in (select order_id from orders);
+----------+
| order_id |
+----------+
|     2143 |
|     2214 |
|     2216 |
+----------+
3 rows in set (0.00 sec)

mysql> 
mysql> select * from order_payments;
+------------+----------+--------+------------+------------+
| payment_id | order_id | amount | date       | is_deleted |
+------------+----------+--------+------------+------------+
|          1 |     2143 |  99.34 | 1405900800 |          0 |
|          2 |     2216 |   5.09 | 1406505600 |          0 |
|          3 |     2214 |  55.10 | 1406505600 |          0 |
+------------+----------+--------+------------+------------+

I couldn't figure out the other relationships in the time available - maybe we can go over them. It may well be that a solid DB structure with overtly declared FOREIGN KEYs may perform better. As a general rule, the more information you give the optimiser, the better.

The point about the optimiser not using indexes when tables are too small and/or too high a percentage of records is accessed still stands - however, there is no harm in doing things correctly - and as your data volumes change/grow, the optimiser may choose different plans - and the more info, the better.

The orphaned record issue (forgetting about performance for the moment) is another GOOD reason to switch to InnoDB. With proper DRI (declarative referential integrity), you cannot have orphaned records, which can be the source of tricky to find bugs.

Finally, just a few words on the system generally. It seems to me that this query is like a runaway train - it has built up and built up momentum and is now overly complex for purpose. Maybe a step back is required? I got to thinking about triggers and using them to populate other tables (similar to a materialised view) and/or even the use of views themselves. You need to have a system that you are comfortable with and that you can change to suit your needs. Some behemoth sitting in the corner spewing out answers that nobody understands is not very conducive to flexibility, upgrading or peace of mind :-).

I would like to do three things (using InnoDB)

1) get all the data types in order (DATE/DATETIMEs when appropriate)

2) sort out the foreign key relationships (need input on that - please feel free to contact me off list for this - my email is in my profile).

3) break this juggernaut down into its constituent components and explore methods of making it better, faster, easier to understand and modify.

I am unable to spend 09:00 - 18:00 at this, but would like to have a look in the evening if that suits and you are interested.