I have this query, it works, I just like to optimize, if possible, here's a fully working example where you can test it
- Two Items are considered duplicates if they have either the same bundle_id or package_id or service_id
- An item is considered expired and in grace period if
(expires + INTERVAL 5 DAY) >= NOW()
- An item is not expired if
expires >=NOW()
I want to do the following
- If duplicates exist, none of them expired, I want to select all of them
- If duplicate exist, some of them expired but others did not, I want to select the items which did not expire
- If duplicates exist, both expired are in the grace period, I'd like to select the most recent.
- I don't want to select items that long expired and are no longer in the grace period
- If an item has no duplicates and is either in the grace period or hasn't expired yet, select it.
- The query should run on MySQL 5.5 and above, the client who needs this query is in the process of upgrading his database and is not done yet.
I select the items in the grace period or haven't expired, order by expire
date to get the newest I use LIMIT
to force group by
to respect the order as per this issue. Then I use group by
to filter duplicates. And then I add to that, the items I know has not expired yet
SELECT * FROM (
SELECT
wtb_orders.orderid,
wtb_orders.duration,
wtb_orders.is_renew,
wtb_orders.bundle_id,
wtb_orders.service_id,
wtb_orders.object_name,
wtb_orders.pkg_id_fk,
expires,
wtb_tokens.disconnect_time
FROM wtb_orders
JOIN wtb_tokens ON wtb_orders.orderid=wtb_tokens.orderid
WHERE
wtb_orders.username= `96190000002` AND
(
(
((expires + INTERVAL 5 DAY) >= NOW())
OR
(expires=0 AND wtb_tokens.orderid IS NOT NULL)
) OR
expires=0
)
ORDER by expires DESC LIMIT 1000000
) as tmp
GROUP BY
tmp.pkg_id_fk,
tmp.bundle_id,
tmp.service_id
UNION
(
SELECT ord.orderid,ord.duration,ord.is_renew,ord.bundle_id,ord.service_id,ord.object_name,ord.pkg_id_fk,tk.expires,tk.disconnect_time FROM wtb_orders AS ord
JOIN wtb_tokens AS tk ON ord.orderid=tk.orderid
WHERE tk.username= ``96190000002``
AND (((tk.expires >= NOW())
OR (tk.expires=0 AND tk.orderid IS NOT NULL))
OR tk.expires=0)
)
I tried optimizing it using eversql which helps most of the time, It gave me this query, it doesn't run, I get an error on UNION DISTINCT (SELECT wtb_orders.orderid ...
Which says, Unrecognized statement near type DISTINCT
and tmp.pkg_id_fk' in 'group statement
you could check the error here, please scroll to the bottom
SELECT
*
FROM
(SELECT
wtb_orders_orderid,
wtb_orders_duration,
wtb_orders_is_renew,
wtb_orders_bundle_id,
wtb_orders_service_id,
wtb_orders_object_name,
wtb_orders_pkg_id_fk,
wtb_tokens_expires,
wtb_tokens_disconn_time
FROM
((SELECT
wtb_orders.orderid AS wtb_orders_orderid,
wtb_orders.duration AS wtb_orders_duration,
wtb_orders.is_renew AS wtb_orders_is_renew,
wtb_orders.bundle_id AS wtb_orders_bundle_id,
wtb_orders.service_id AS wtb_orders_service_id,
wtb_orders.object_name AS wtb_orders_object_name,
wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
wtb_tokens.expires AS wtb_tokens_expires,
wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
FROM
wtb_orders
JOIN
wtb_tokens
ON wtb_orders.orderid = wtb_tokens.orderid
WHERE
wtb_orders.username = '96190000002'
AND (
wtb_tokens.expires = 0
)
ORDER BY
wtb_tokens.expires DESC LIMIT 1000000)
UNION
DISTINCT (SELECT
wtb_orders.orderid AS wtb_orders_orderid,
wtb_orders.duration AS wtb_orders_duration,
wtb_orders.is_renew AS wtb_orders_is_renew,
wtb_orders.bundle_id AS wtb_orders_bundle_id,
wtb_orders.service_id AS wtb_orders_service_id,
wtb_orders.object_name AS wtb_orders_object_name,
wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
wtb_tokens.expires AS wtb_tokens_expires,
wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
FROM
wtb_orders
JOIN
wtb_tokens
ON wtb_orders.orderid = wtb_tokens.orderid
WHERE
wtb_orders.username = '96190000002'
AND (((wtb_tokens.expires = 0
AND wtb_tokens.orderid IS NOT NULL)))
ORDER BY
wtb_tokens.expires DESC LIMIT 1000000)
UNION
DISTINCT (SELECT
wtb_orders.orderid AS wtb_orders_orderid,
wtb_orders.duration AS wtb_orders_duration,
wtb_orders.is_renew AS wtb_orders_is_renew,
wtb_orders.bundle_id AS wtb_orders_bundle_id,
wtb_orders.service_id AS wtb_orders_service_id,
wtb_orders.object_name AS wtb_orders_object_name,
wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
wtb_tokens.expires AS wtb_tokens_expires,
wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
FROM
wtb_orders
JOIN
wtb_tokens
ON wtb_orders.orderid = wtb_tokens.orderid
WHERE
wtb_orders.username = '96190000002'
AND ((((wtb_tokens.expires + INTERVAL 5 DAY) >= NOW())))
ORDER BY
wtb_tokens.expires DESC LIMIT 1000000)
) AS union1
ORDER BY
union1.wtb_tokens_expires DESC LIMIT 1000000) AS tmp
GROUP BY
tmp.pkg_id_fk,
tmp.bundle_id,
tmp.service_id
UNION
(
SELECT
ord.orderid,
ord.duration,
ord.is_renew,
ord.bundle_id,
ord.service_id,
ord.object_name,
ord.pkg_id_fk,
tk.expires,
tk.disconnect_time
FROM
wtb_orders AS ord
JOIN
wtb_tokens AS tk
ON ord.orderid = tk.orderid
WHERE
tk.username = '96190000002'
AND (
tk.expires = 0
)
)
UNION
DISTINCT (SELECT
ord.orderid,
ord.duration,
ord.is_renew,
ord.bundle_id,
ord.service_id,
ord.object_name,
ord.pkg_id_fk,
tk.expires,
tk.disconnect_time
FROM
wtb_orders AS ord
JOIN
wtb_tokens AS tk
ON ord.orderid = tk.orderid
WHERE
tk.username = '96190000002'
AND (((tk.expires = 0
AND tk.orderid IS NOT NULL))))
UNION
DISTINCT (SELECT
ord.orderid,
ord.duration,
ord.is_renew,
ord.bundle_id,
ord.service_id,
ord.object_name,
ord.pkg_id_fk,
tk.expires,
tk.disconnect_time
FROM
wtb_orders AS ord
JOIN
wtb_tokens AS tk
ON ord.orderid = tk.orderid
WHERE
tk.username = '96190000002'
AND (((tk.expires >= NOW()))))
Even If I manage to fix the eversql query, I'm not convinced it's better than mine, okay it removes the OR and uses union, but I had to use LIMIT
to force group by to respect the order, not because it's a good thing. And I'm sure this is a bad practice but I needed a solution.
The optimized query uses 2 limits, so it creates 2 temporary tables for nothing.
Best Answer
For performance, don't try to do it all in one statement.
Avoid non-sargable expressions, such as
(expires + INTERVAL 5 DAY) >= NOW()
; instead say(expires >= NOW() - INTERVAL 5 DAY)
Probably it would be better to do 3 queries to test for the 3 possible dups (
bundle_id
, etc).If it would be OK to pick the newest expired
bundle_id
dup without seeing if the other two have newer dups, then do so. That is 3 separate queries, and stop when one is found. If all 3 fail to find a dup, then fall into the 4th query -- find all dups.If that is not OK, the probably a
UNION
of those 3 dups would be optimal:Since I am talking about multiple queries, you would need to either use a Stored Procedure so you can have IF statements or use client code to orchestrate the queries.
Once you have restructured the queries this way, I'll help you with optimal indexes. Please provide
SHOW CREATE TABLE
so we can see the datatypes, indexes, etc.(I left out issues about the grace period; perhaps that won't double the number of queries.)
5.5 is probably not too old to get this optimized.