I only see you doing a SELECT
in both variants. If you want to make sure you don't sell more than you have in store (stock), you must decrease your stock in the same transaction you place the order. Use a data-modifying CTE for the job (Postgres 9.1 or later):
WITH u AS (
UPDATE product SET quant = quant - <put_order_quant_here>
WHERE product_id = <order_prod_id>
AND quant >= <put_order_quant_here>
RETURNING product_id, <put_order_quant_here> AS quant
)
INSERT INTO order_detail (order_id, product_id, quant)
SELECT <put_order_id_here>, product_id, quant
FROM u;
The UPDATE
in the CTE only returns values if the product has sufficient stock (and it takes a write lock on the row). In this case, the quantity is reduced in the same transaction, just before the order is placed.
Put all order-details into one transaction, if any of them fails to INSERT
then ROLLBACK
.
Possible deadlocks
This scenario can easily lead to deadlocks. Say, you have two orders coming in at the same time, both want product A and B. The first order starts with A, the second starts with B. Then the two transactions block each other out. Each waits for the other to complete. A deadlock ensues.
In PostgreSQL a transaction will wait for some time when it is stalled by locks. Depending on your setting of deadlock_timeout
(default is 1 sesond), checks for possible deadlocks are performed.
Once detected, one transaction is aborted and a deadlock exception reported. The other one can finish. Which one is hard to predict.
There is a simple way to avoid this kind of deadlocks: Always place your order_details
in a consistent order. Like products ordered by product_id
. This way, the above scenario can never happen.
You'll need to first create a list of every product_number
and date
combination. You can do this using a CROSS JOIN
of your table:
select distinct p.product_number, d.date
from yourtable p
cross join yourtable d;
See SQL Fiddle with Demo. This will create a list of data similar to:
| PRODUCT_NUMBER | DATE |
|----------------|---------------------------------|
| 100 | January, 01 2010 00:00:00+0000 |
| 200 | January, 01 2010 00:00:00+0000 |
| 100 | February, 01 2010 00:00:00+0000 |
You will then use the above query and LEFT JOIN
to your table to return the final result:
select
pd.product_number,
pd.date,
t.value
from
(
-- list of every product/date
select distinct p.product_number, d.date
from yourtable p
cross join yourtable d
) pd
left join yourtable t
on pd.date = t.date
and pd.product_number = t.product_number
order by pd.product_number, pd.date;
See SQL Fiddle with Demo. Giving a final result of:
| PRODUCT_NUMBER | DATE | VALUE |
|----------------|---------------------------------|--------|
| 100 | January, 01 2010 00:00:00+0000 | 1 |
| 100 | February, 01 2010 00:00:00+0000 | 1 |
| 100 | March, 01 2010 00:00:00+0000 | 1 |
| 200 | January, 01 2010 00:00:00+0000 | 1 |
| 200 | February, 01 2010 00:00:00+0000 | (null) |
| 200 | March, 01 2010 00:00:00+0000 | 1 |
The LEFT JOIN
returns all rows from your list of products and dates regardless of whether a matching row exists in the other table.
This could also be written as:
select
p.product_number,
d.date,
t.value
from
(
-- list of every product
select distinct product_number
from yourtable
) p
cross join
(
-- list of every date
select distinct date
from yourtable
) d
-- then join to the table
left join yourtable t
on d.date = t.date
and p.product_number = t.product_number
order by p.product_number, d.date ;
See SQL Fiddle with Demo. This may have better performance depending on your table size.
Now if you wanted to return a list of all dates, regardless of whether or not they appear in the table, then I would suggest creating a table of dates. This table would be used in a similar manner to create a list of all dates/products which you would then join.
The table would be similar to:
CREATE TABLE dates
(`date` datetime)
;
INSERT INTO dates
(`date`)
VALUES
('2010-01-01 00:00:00'),
('2010-02-01 00:00:00'),
('2010-03-01 00:00:00'),
('2010-04-01 00:00:00'),
('2010-05-01 00:00:00')
;
You'd then use the following query to get the list of dates/products:
select distinct p.product_number, d.date
from yourtable p
cross join dates d
And finally, you would join that back to your table:
select
pd.product_number,
pd.date,
t.value
from
(
-- list of every product/date
select distinct p.product_number, d.date
from yourtable p
cross join dates d
) pd
left join yourtable t
on pd.date = t.date
and pd.product_number = t.product_number
order by pd.product_number, pd.date;
See SQL Fiddle with Demo. Or an alternative:
select
p.product_number,
d.date,
t.value
from
(
-- list of every product
select distinct product_number
from yourtable
) p
cross join
dates
d
-- then join to the table
left join yourtable t
on d.date = t.date
and p.product_number = t.product_number
order by p.product_number, d.date ;
See SQL Fiddle with Demo. Again this may have better performance based on the table size. Using this type of solution, you'd return all dates even those not in your table:
| PRODUCT_NUMBER | DATE | VALUE |
|----------------|---------------------------------|--------|
| 100 | January, 01 2010 00:00:00+0000 | 1 |
| 100 | February, 01 2010 00:00:00+0000 | 1 |
| 100 | March, 01 2010 00:00:00+0000 | 1 |
| 100 | April, 01 2010 00:00:00+0000 | (null) |
| 100 | May, 01 2010 00:00:00+0000 | (null) |
| 200 | January, 01 2010 00:00:00+0000 | 1 |
| 200 | February, 01 2010 00:00:00+0000 | (null) |
| 200 | March, 01 2010 00:00:00+0000 | 1 |
| 200 | April, 01 2010 00:00:00+0000 | (null) |
| 200 | May, 01 2010 00:00:00+0000 | (null) |
Best Answer
In general, it's best to leave the specifics of how the data is joined together to the DB engine - usually, there is an optimizer that looks at table/index statistics, and figures out the most efficient way to eliminate rows from consideration. If you're having issues with performance on a specific query, you can sometimes force the engine to use a particular method/option on a given query - but this can lead to good performance in specific cases and bad performance in everything else.
That said, the logic of your query is such that rows from your "right" table should be excluded from consideration if they don't belong to the targeted buyer/order, so they do need to be included as a part of the
LEFT JOIN
clause. I would rework that as follows:If you simply join the two tables, the DB engine will figure out what columns it needs out of those tables from the rest of the query. I can't think of a good reason to use a sub-query that just selects specific columns from a table in a join, rather than using the table itself directly. I won't say there isn't such a reason, just that I have never encountered one.
Some versions of SQL may materialize a subquery - basically, running the subquery before the main query is evaluated, then evaluating the main query from those results. If you do choose to use a subquery, I'd recommend not using
SELECT *
, but rather specifying just the actual columns you need in the main query (SELECT product_id, order_qty from orders
...), to avoid pulling in data you aren't actually going to use.That same advice holds for the main query; rather that using
SELECT *
, specify the actual columns you need. As noted before, the DB engine will figure out what columns it needs from each table by looking at the main query's SELECT list, theWHERE
clause, theJOIN
criteria, and so forth. If you specify more columns in theSELECT
list than you really need, then the engine will give them to you. If each row of data in your table has twovarchar
fields that average 250 characters each, but all you really need is avarchar(25)
field and three integers, you'll be returning a result set that's around 10 times as large as it needs to be (which will take about ten times as much time to reach the machine that made the request).Limiting your columns to those you actually need also will allowing the DB engine's optimizer to apply certain shortcuts that can improve performance. For example, if your table's rows are (on average) 1000 bytes wide, but you have an index that includes just the columns you need, and those columns only average 100 bytes per row, the optimizer can use that index as if it were the table, and thus may be able to reduce the number of database pages it has to read to get your data by a factor of 10. the term for this is a covering index.
It's also a good idea to assign column names to computed columns - it makes it easier to refer to them elsewhere.
So, we'd end up with the following queries (note that I'm assuming you only will be using a column named
product_name
in your results; if other columns fromproducts
apply, add them to theSELECT
list (and to the ORDER BY clause):For a specific order:
For a buyer's total activity:
A few final notes:
In some SQL variants, every column that is not in an aggregate function in the
SELECT
list must appear in theGROUP BY
clause. Other versions allow you to leave columns out - the value from a "random" row will be returned. If you've got a unique value from a table in theGROUP BY
, it would be safe to leave other columns from that table out if using one of these other versions, as all possible values from the rows being aggregated would have to be the same.Also, your query makes an assumption: any order will only have one line item for a given product. Now, that seems like a reasonable assumption on the surface; however, I have seen systems where someone might place an order for 10 widgets on the 1st of the month, and an additional 10 on the 15th. If there's any chance that a product might show up twice in the same order, you can use the second query to pull the order-specific information as well; you just need to put
AND o.order_no = 123
back in, like in the first query. This would probably take slightly longer than the query without theSUM
and theORDER BY
- but I would expect, in most cases, the difference would be too small for a human being to tell.