At the moment, I am using an instance of MySQL 5.6.19 which has the performance_schema enabled on a 2GB x64 Linux laptop (nothing stellar by any means!). My normal response time is approx. 0.5 s - not too bad. What's your setup (RAM, disk config, OS &c.)?
I must say that I'm surprised at 3 seconds on a server. Are you sure that it's not some sort of network issue?
I checked out I/O for the query and noticed something strange - every time the query executes, the order_items table is accessed 27,321 times - which is exactly the same as the number of records. This smells of a correlated subquery - but the SQL is horribly complex and I haven't been able to get to the bottom of it (not enough time). Obviously, you should only want to access the table once (or a few times maybe) at most. Order_status is only read once for example.
It's not really a surprise that your query doesn't use indexes - see this post here which gives some explanation as to why - basically if the optimiser thinks that a FTS (full table scan) is cheaper than using the index, then that's what it will do. This is a function of the size of the table(s) (yours are v. small ===> FTS) and/or percentage of records to be retrieved (yours 100%, again ==> FTS).
I notice that you are using MyISAM tables, for full text indexes? InnoDB now has full text indexes - see here and check the docco. Also (and more importantly), InnoDB allows you to use DRI (declarative referential integrity) which is very important - IMHO, there is no good reason to use MyISAM tables anymore.
Could you explain the business logic behind this query? It strikes me (and this is only an opinion) that it is overly complex for a relatively simple system with 8 tables - has it grown over time and been added to by different people? I am respectfully suggesting that the time may have come for a complete rewrite from scratch rather than tinkering at the edges.
Another problem is that for orders, you have ordered_date as an INT(12) - when I first saw the data, I thought that it was a timestamp. You are creating extra work for yourself if you don't use the correct data type for your columns. Error checking is more difficult.
[EDIT] in response to OP's comments (easy to hard)
1)
INT v TIMESTAMP v DATE/DATETIME
Take a look at these (1, 2, 3, 4) posts. I suppose that INT v TIMESTAMP is relatively trivial (however, see point 4 in the answer here), but INT/TIMESTAMP v DATE/DATETIME isn't. Check out the answer to post 4 - you should be using DATEs/DATETIMEs when you are using INT/TIMESTAMP.
2)
MySQL 5.6.19, but the server is using 5.5.37. Would upgrading improve
the performance?
Maybe, maybe not - it's probably trivial in respect to the issues that I'll be going through in point 3 - see InnoDB/Full Text Indexing. If I had to guess, upgrading would decrease performance by no more than a couple of percent (some think that the performance schema adds overhead) and quite possibly be a significant improvement. These things can vary according to many parameters (disk, CPU, RAM, network, server load, phase of the moon...). Overall though, performance should improve release by release although YMMV.
3)
General remarks.
I changed the storage engine for your system from MyISAM to InnoDB. This resulted in the query time being consistently reduced from ~ .45s to ~ .37s (~20%) - a noticeable improvement - maybe you should try it? I can only speak for the testing (not v. thorough) I did on my 2GB x64 Linux laptop. You didn't answer my question about your own server specs. If I'm getting a response time of ~ .4s, I can't for the life of me understand why you're getting 3s? Unless it's (as I suggested) a network issue, or is the server under load when you run the query?
InnoDB now has full text indexing in 5.6 (see my point 2 above) - AFAIK, that will not work in 5.5 (i.e. your table declarations will fail for InnoDB) - another reason to upgrade.
I tried to put in FOREIGN KEY relationships in the tables, but there appears to be problems with your data - I'm getting different counts after reloading the schemas (without FK and with FK). This can mean you have orphaned records, which may or may not affect the result. Anomalies occurred with orders/order_items and customers/customer_addresses. e.g.
mysql> select i.item_id, i.order_id from order_items i where i.order_id not in (SELECT ORDER_ID from orders);
+---------+----------+
| item_id | order_id |
+---------+----------+
| 26892 | 2433 |
+---------+----------+
Worse, none of the order payments appear valid
mysql> select p.order_id from order_payments p where p.order_id not in (select order_id from orders);
+----------+
| order_id |
+----------+
| 2143 |
| 2214 |
| 2216 |
+----------+
3 rows in set (0.00 sec)
mysql>
mysql> select * from order_payments;
+------------+----------+--------+------------+------------+
| payment_id | order_id | amount | date | is_deleted |
+------------+----------+--------+------------+------------+
| 1 | 2143 | 99.34 | 1405900800 | 0 |
| 2 | 2216 | 5.09 | 1406505600 | 0 |
| 3 | 2214 | 55.10 | 1406505600 | 0 |
+------------+----------+--------+------------+------------+
I couldn't figure out the other relationships in the time available - maybe we can go over them. It may well be that a solid DB structure with overtly declared FOREIGN KEYs may perform better. As a general rule, the more information you give the optimiser, the better.
The point about the optimiser not using indexes when tables are too small and/or too high a percentage of records is accessed still stands - however, there is no harm in doing things correctly - and as your data volumes change/grow, the optimiser may choose different plans - and the more info, the better.
The orphaned record issue (forgetting about performance for the moment) is another GOOD reason to switch to InnoDB. With proper DRI (declarative referential integrity), you cannot have orphaned records, which can be the source of tricky to find bugs.
Finally, just a few words on the system generally. It seems to me that this query is like a runaway train - it has built up and built up momentum and is now overly complex for purpose. Maybe a step back is required? I got to thinking about triggers and using them to populate other tables (similar to a materialised view) and/or even the use of views themselves. You need to have a system that you are comfortable with and that you can change to suit your needs. Some behemoth sitting in the corner spewing out answers that nobody understands is not very conducive to flexibility, upgrading or peace of mind :-).
I would like to do three things (using InnoDB)
1) get all the data types in order (DATE/DATETIMEs when appropriate)
2) sort out the foreign key relationships (need input on that - please feel free to contact me off list for this - my email is in my profile).
3) break this juggernaut down into its constituent components and explore methods of making it better, faster, easier to understand and modify.
I am unable to spend 09:00 - 18:00 at this, but would like to have a look in the evening if that suits and you are interested.
I'm more of a SQL Server guy, but here is a stab to get you motivated. You could use the CASE expression to help SUM() orders by certain order types.
For example:
SELECT
idclient AS [Client ID]
,COUNT(value) AS [Total Orders]
,SUM(value) AS [Total Order Value]
,SUM( CASE WHEN type1 = 'some type' THEN value ELSE 0 END) AS [Order Combo 1]
,SUM( CASE WHEN type1 = 'some type' AND type2 = 'another type' THEN value ELSE 0 END) [Order Combo 2]
,SUM( CASE WHEN type1 = 'some type' AND type2 = 'another type' AND type3 = 'another type' THEN value ELSE 0 END) [Order Combo 3]
FROM 'test'.'orders'
GROUP BY idclient
This will group the orders on [idclient] and will SUM([value]) the instances where certain order records meet the following conditions or combinations you specified.
So, if a customer has an order where [type1] was Apples and [type2] was Oranges, you can SUM([value]) that was 1 under a new column called something like [Apple/Orange Orders].
The result can also include additional combinations for the same record in multiple fields/columns. If that is a problem, just move the CASE to the GROUP BY to break each combination into a separate record in the the result set.
Hope this helps!
Best Answer
Try the following:
Describing the logic in words, this will find all orders for the first product number, then require there also be orders for the same
customerNumber
that has ordered the second product number. (This logic is NOT meant to imply the order the optimizer will choose joins. It is only the logic used to construct the query, based on the requirements stated.)I dropped the
ORDER BY
becauseGROUP BY
includes an order by as part of what it does. Although, the optimizer likely would already know it is ordered, and so it is just syntax difference.As for whether this is more efficient, I would benchmark it to see. For larger data sets I believe my solution will be better than an
OR
based one, but again, always benchmark.