MySQL Query optimization: select all duplicates if none expired, or the newest if one expired

mariadbMySQLmysql-5.5mysql-5.7optimization

I have this query, it works, I just like to optimize, if possible, here's a fully working example where you can test it

  • Two Items are considered duplicates if they have either the same bundle_id or package_id or service_id
  • An item is considered expired and in grace period if (expires + INTERVAL 5 DAY) >= NOW()
  • An item is not expired if expires >=NOW()

I want to do the following

  • If duplicates exist, none of them expired, I want to select all of them
  • If duplicate exist, some of them expired but others did not, I want to select the items which did not expire
  • If duplicates exist, both expired are in the grace period, I'd like to select the most recent.
  • I don't want to select items that long expired and are no longer in the grace period
  • If an item has no duplicates and is either in the grace period or hasn't expired yet, select it.
  • The query should run on MySQL 5.5 and above, the client who needs this query is in the process of upgrading his database and is not done yet.

I select the items in the grace period or haven't expired, order by expire date to get the newest I use LIMIT to force group by to respect the order as per this issue. Then I use group by to filter duplicates. And then I add to that, the items I know has not expired yet

          SELECT * FROM (
                    SELECT
                    wtb_orders.orderid,
                    wtb_orders.duration,
                    wtb_orders.is_renew,
                    wtb_orders.bundle_id,
                    wtb_orders.service_id,
                    wtb_orders.object_name,
                    wtb_orders.pkg_id_fk,
                    expires,
                    wtb_tokens.disconnect_time
                    FROM wtb_orders
                    JOIN wtb_tokens ON wtb_orders.orderid=wtb_tokens.orderid
                    WHERE
                        wtb_orders.username= `96190000002` AND
                        (
                            (
                                ((expires + INTERVAL 5 DAY) >= NOW())
                                OR
                                (expires=0 AND wtb_tokens.orderid IS NOT NULL)
                            ) OR
                            expires=0
                        )
                    ORDER by expires DESC LIMIT 1000000
                ) as tmp
                GROUP BY
            tmp.pkg_id_fk,
            tmp.bundle_id,
                tmp.service_id
                    UNION
                (
                    SELECT ord.orderid,ord.duration,ord.is_renew,ord.bundle_id,ord.service_id,ord.object_name,ord.pkg_id_fk,tk.expires,tk.disconnect_time FROM wtb_orders AS ord
                    JOIN wtb_tokens AS tk ON ord.orderid=tk.orderid 
                    WHERE tk.username= ``96190000002``
                    AND (((tk.expires >= NOW())
                    OR (tk.expires=0 AND tk.orderid IS NOT NULL))
                    OR tk.expires=0)
                )

I tried optimizing it using eversql which helps most of the time, It gave me this query, it doesn't run, I get an error on UNION DISTINCT (SELECT wtb_orders.orderid ... Which says, Unrecognized statement near type DISTINCT and tmp.pkg_id_fk' in 'group statement you could check the error here, please scroll to the bottom

 SELECT
        *
    FROM
        (SELECT
            wtb_orders_orderid,
            wtb_orders_duration,
            wtb_orders_is_renew,
            wtb_orders_bundle_id,
            wtb_orders_service_id,
            wtb_orders_object_name,
            wtb_orders_pkg_id_fk,
            wtb_tokens_expires,
            wtb_tokens_disconn_time
        FROM
            ((SELECT
                wtb_orders.orderid AS wtb_orders_orderid,
                wtb_orders.duration AS wtb_orders_duration,
                wtb_orders.is_renew AS wtb_orders_is_renew,
                wtb_orders.bundle_id AS wtb_orders_bundle_id,
                wtb_orders.service_id AS wtb_orders_service_id,
                wtb_orders.object_name AS wtb_orders_object_name,
                wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
                wtb_tokens.expires AS wtb_tokens_expires,
                wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
            FROM
                wtb_orders
            JOIN
                wtb_tokens
                    ON wtb_orders.orderid = wtb_tokens.orderid
            WHERE
                wtb_orders.username = '96190000002'
                AND (
                    wtb_tokens.expires = 0
                )
            ORDER BY
                wtb_tokens.expires DESC LIMIT 1000000)
        UNION
        DISTINCT (SELECT
            wtb_orders.orderid AS wtb_orders_orderid,
            wtb_orders.duration AS wtb_orders_duration,
            wtb_orders.is_renew AS wtb_orders_is_renew,
            wtb_orders.bundle_id AS wtb_orders_bundle_id,
            wtb_orders.service_id AS wtb_orders_service_id,
            wtb_orders.object_name AS wtb_orders_object_name,
            wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
            wtb_tokens.expires AS wtb_tokens_expires,
            wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
        FROM
            wtb_orders
        JOIN
            wtb_tokens
                ON wtb_orders.orderid = wtb_tokens.orderid
        WHERE
            wtb_orders.username = '96190000002'
            AND (((wtb_tokens.expires = 0
            AND wtb_tokens.orderid IS NOT NULL)))
        ORDER BY
            wtb_tokens.expires DESC LIMIT 1000000)
    UNION
    DISTINCT (SELECT
        wtb_orders.orderid AS wtb_orders_orderid,
        wtb_orders.duration AS wtb_orders_duration,
        wtb_orders.is_renew AS wtb_orders_is_renew,
        wtb_orders.bundle_id AS wtb_orders_bundle_id,
        wtb_orders.service_id AS wtb_orders_service_id,
        wtb_orders.object_name AS wtb_orders_object_name,
        wtb_orders.pkg_id_fk AS wtb_orders_pkg_id_fk,
        wtb_tokens.expires AS wtb_tokens_expires,
        wtb_tokens.disconnect_time AS wtb_tokens_disconn_time
    FROM
        wtb_orders
    JOIN
        wtb_tokens
            ON wtb_orders.orderid = wtb_tokens.orderid
    WHERE
        wtb_orders.username = '96190000002'
        AND ((((wtb_tokens.expires + INTERVAL 5 DAY) >= NOW())))
    ORDER BY
        wtb_tokens.expires DESC LIMIT 1000000)
) AS union1
ORDER BY
union1.wtb_tokens_expires DESC LIMIT 1000000) AS tmp
GROUP BY
tmp.pkg_id_fk,
tmp.bundle_id,
tmp.service_id
UNION
(
SELECT
    ord.orderid,
    ord.duration,
    ord.is_renew,
    ord.bundle_id,
    ord.service_id,
    ord.object_name,
    ord.pkg_id_fk,
    tk.expires,
    tk.disconnect_time
FROM
    wtb_orders AS ord
JOIN
    wtb_tokens AS tk
        ON ord.orderid = tk.orderid
WHERE
    tk.username = '96190000002'
    AND (
        tk.expires = 0
    )
)
UNION
DISTINCT (SELECT
ord.orderid,
ord.duration,
ord.is_renew,
ord.bundle_id,
ord.service_id,
ord.object_name,
ord.pkg_id_fk,
tk.expires,
tk.disconnect_time
FROM
wtb_orders AS ord
JOIN
wtb_tokens AS tk
ON ord.orderid = tk.orderid
WHERE
tk.username = '96190000002'
AND (((tk.expires = 0
AND tk.orderid IS NOT NULL))))
UNION
DISTINCT (SELECT
ord.orderid,
ord.duration,
ord.is_renew,
ord.bundle_id,
ord.service_id,
ord.object_name,
ord.pkg_id_fk,
tk.expires,
tk.disconnect_time
FROM
wtb_orders AS ord
JOIN
wtb_tokens AS tk
ON ord.orderid = tk.orderid
WHERE
tk.username = '96190000002'
AND (((tk.expires >= NOW()))))

Even If I manage to fix the eversql query, I'm not convinced it's better than mine, okay it removes the OR and uses union, but I had to use LIMIT to force group by to respect the order, not because it's a good thing. And I'm sure this is a bad practice but I needed a solution.

The optimized query uses 2 limits, so it creates 2 temporary tables for nothing.

Best Answer

For performance, don't try to do it all in one statement.

Avoid non-sargable expressions, such as (expires + INTERVAL 5 DAY) >= NOW(); instead say (expires >= NOW() - INTERVAL 5 DAY)

Probably it would be better to do 3 queries to test for the 3 possible dups (bundle_id, etc).

If it would be OK to pick the newest expired bundle_id dup without seeing if the other two have newer dups, then do so. That is 3 separate queries, and stop when one is found. If all 3 fail to find a dup, then fall into the 4th query -- find all dups.

If that is not OK, the probably a UNION of those 3 dups would be optimal:

SELECT
    FROM (
        ( SELECT id FROM ... ) -- find latest bundle_id dup, if any
        UNION DISTINCT
        ( SELECT id FROM ... ) -- find latest ... dup, if any
        UNION DISTINCT
        ( SELECT id FROM ... ) -- find latest ... dup, if any
         ) AS x
    JOIN ... to get other columns
    ORDER BY ...  LIMIT 1  -- Get the latest of the 3 or fewer

Since I am talking about multiple queries, you would need to either use a Stored Procedure so you can have IF statements or use client code to orchestrate the queries.

IF the above query returned no rows
THEN
    SELECT ... -- all dups
END IF

Once you have restructured the queries this way, I'll help you with optimal indexes. Please provide SHOW CREATE TABLE so we can see the datatypes, indexes, etc.

(I left out issues about the grace period; perhaps that won't double the number of queries.)

5.5 is probably not too old to get this optimized.