WARNING! Depending on authors response, my answer could be way off. Awaiting his reponse.
reconbot, we need more info, because it seems like your entire subquery
SELECT DISTINCT ON (client_id) id
FROM orders
WHERE
start_date <= ship_date and (end_date is NULL OR end_date >= ship_date)
AND client_id = items_to_work.client_id
ORDER BY client_id, order_type DESC
is completely redundant. You appear to be re-checking for already checked conditions.
For example, try just running
SELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;
and you get the exact same results. See the SQL Fiddle here. If you're still concerned about distinct client_id
s, you could also use
SELECT *
FROM (
SELECT DISTINCT ON (client_id)
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work;
as in this SQL Fiddle.
So, I'm either misunderstanding, or your originally given data set doesn't reflect the significance of your sub-SELECT
in the original question.
IGNORE EVERYTHING BELOW HERE...
Until we get clarification from the author...
I think I've got your solution. At the very least, I've made a SQLFiddle with the results, and it appears that it will be much less costly.
LATERAL
sub-queries in PostgreSQL
I love chances when a LATERAL
sub-query can be used to save some time in your query. Unfortunately, I think I'm pretty bad at explaining when and where it should be used, and I'm only OK at recognizing instances of when to use it. It just doesn't come up too often in my particular query designs.
Take a look at the Postgres documentation on LATERAL
keyword for some ideas, and also I really like this SlideShare presentation by Markus Winand for helping to explain LATERAL
a bit better. In essence, it has a flavor of a "for each" statement in typical pseudo-coding vernacular.
The reason I looked into it for your case was: you were building the items_to_work
table, and then using the client_id
attribute of items_to_work
in your sub-SELECT
, where you checked if items_to_work.order_id
was IN
the distinct returned set of orders.id
values. Using items_to_work
attributes in the WHERE
clause of the sub-SELECT
was the red flag for me.
OK, so I realize the explanation of my motivation isn't so hot, sorry! :P On to the results...
New Query
Without further ado, here it is:
SELECT *
FROM (
SELECT
order_items.id item_id,
order_items.product_id,
order_items.quantity,
order_items.lead_days,
orders.id order_id,
orders.client_id client_id,
(DATE '2014-07-18' + order_items.lead_days) ship_date
FROM order_items
INNER JOIN orders ON orders.id = order_items.order_id
WHERE
-- find all order items that have active orders on the production_date + lead time
DATE '2014-07-18' >= (orders.start_date - order_items.lead_days)
AND (
DATE '2014-07-18' <= (orders.end_date - order_items.lead_days)
OR orders.end_date IS NULL
)
) items_to_work,
LATERAL(
SELECT id FROM orders
WHERE orders.client_id = items_to_work.client_id
AND start_date <= ship_date and (end_date is NULL OR end_date >= ship_date)
AND items_to_work.order_id = orders.id) lat;
Also, you can note that we get back identical results as those you are expecting, except for a single added column on the right side, as part of the LATERAL
selection, which is a replication of the orders.id
. I'm not super experienced with LATERAL
(it doesn't come up too often in my work), so if that's a problem for you, we can sort out a way to drop it. :P
EXPLAIN
results
So, we don't have the big data set which you have, so that we can really test out the results of the new query versus the old. I'm relying on the EXPLAIN
estimates here, but...
Using the Old SQLFiddle, we can see that the overall estimated cost is approximately 283,000 Postgres units. :P
Using the New SQLFiddle, we can see the much better estimated cost of only 204!!
Best Answer
It's a case of relational-division.
We have assembled an arsenal of query techniques in this related answer on SO:
For your case this query will be among the fastest possible solutions:
If you just need the
visit_id
you don't even need to join to tablevisits
.db<>fiddle here
Old sqlfiddle.