MySQL Query GroupBy and Having Too Slow – Performance Optimization

MySQLperformancequery-performance

I need to select the oldest(min) due datetime for each InternalOrder and then filter these against some datetime for example NOW().
As you can see in schema image, I designed the schema as one InternalOrder has multiple Deliveries and each Delivery has multiple DeliveryRev(revisions), as I need to have whole trace of inputed data. I always look at the latest revision, if that one is deleted the whole delivery should not be considered.

Table rows:

  • InternalOrder: 2970
  • Delivery: 3258
  • DeliveryRev: 12272

I select the newest DeliveryRev via joins and filter these, which have due datetime field filled and also are not deleted.

I have working query (first one), but it is not exactly, what I need, since I can get the result with InternalOrder, which have oldest(min) due under specified threshold(not selected) and another one, which is over (in the result). In this case I do not want that InternalOrder in the result, which I solved using having instead of where, but this query is taking ages.

Am I doing something wrong with HAVING, or how can I optimize/change to get right result?

schema

This query is fast (40ms), but not getting right result.

SELECT
    s9_.internal_order_id,
    MIN(s10_.due) AS minDue
FROM sales_internal_delivery s9_
    LEFT JOIN sales_internal_deliveryrev s10_
        ON s9_.id = s10_.delivery_id
           AND (s10_.due IS NOT NULL AND s10_.deleted = 0)
    LEFT JOIN sales_internal_deliveryrev s11_
        ON s10_.delivery_id = s11_.delivery_id
           AND s10_.timestamp < s11_.timestamp
WHERE
    s11_.delivery_id IS NULL
    AND s10_.due > NOW()
GROUP BY s9_.internal_order_id
ORDER BY minDue ASC

This query is slow (25s), but the result is alright.

SELECT
    s9_.internal_order_id,
    MIN(s10_.due) AS minDue
FROM sales_internal_delivery s9_
    LEFT JOIN sales_internal_deliveryrev s10_
        ON s9_.id = s10_.delivery_id
           AND (s10_.due IS NOT NULL AND s10_.deleted = 0)
    LEFT JOIN sales_internal_deliveryrev s11_
        ON s10_.delivery_id = s11_.delivery_id
           AND s10_.timestamp < s11_.timestamp
WHERE
    s11_.delivery_id IS NULL
GROUP BY s9_.internal_order_id
HAVING minDue > NOW()
ORDER BY minDue ASC

explain

Best Answer

I don't know that the query could be refactored much and still be logically correct. And since I don't have your dataset it is difficult to test possible solutions for performance.

This query should still be logically the same, but may be executed in a more efficient way.

Select sid.internal_order_id,
    Min(sidr.due) As minDue
  From (
    Select delivery_id,
        Max(timestamp) As MaxTimestamp
      From sales_internal_deliveryrev
      Group By delivery_id) As lr
  Join sales_internal_deliveryrev As sidr
    On lr.delivery_id = sidr.delivery_id
    And lr.MaxTimestamp = sidr.timestamp
  Join sales_internal_delivery As sid
    On sidr.delivery_id = sid.id  
  Where sidr.deleted = 0
    And sidr.due Is Not Null
  Group By sid.internal_order_id
  Having minDue > Now()
  Order By minDue Asc; 

Edit

As per discussion about the proper way to handle revisions with deleted = 0 this is an alternate version which does not consider a revision with deleted = 1 when looking for the latest revision.

The WHERE deleted = 0 has been moved into the inner subquery.

Select sid.internal_order_id,
    Min(sidr.due) As minDue
  From (
    Select delivery_id,
        Max(timestamp) As MaxTimestamp
      From sales_internal_deliveryrev
      Where deleted = 0
      Group By delivery_id) As lr
  Join sales_internal_deliveryrev As sidr
    On lr.delivery_id = sidr.delivery_id
    And lr.MaxTimestamp = sidr.timestamp
  Join sales_internal_delivery As sid
    On sidr.delivery_id = sid.id  
  Where sidr.due Is Not Null
  Group By sid.internal_order_id
  Having minDue > '2016-12-07'
  Order By minDue Asc;