You say: "My best educated guess is that somehow max is being used to avoid multiple grouping columns"
That is correct.
and then: "... but how can this return the correct results?"
It returns correct results because the Symbol
is the primary key in both the Investments
and the Price
tables. Therefore, any aggregate function over a P.column
or an I.column
is aggregating identical values. And MAX(c)
when c
is 2, 2, 2 or 2
is of course 2
.
Could the query be written somehow else, possibly without all these aggregations? Yes, see a related question: Why do wildcards in GROUP BY statements not work?
It would have to be a rather long GROUP BY
clause or have the aggregations moved into a subquery with only the Holdings_Secure
table (where Symbol
is not the Primary key) and then joined to the other two:
SELECT
I.Symbol Symbol
, I.Ticker CUSIP
, I.Name Name
, H.TotalQuantity
, H.TotalMarketValue
, H.Price
, I.CategoryCode5 BUY_SELL
, I.EquivFactor1 PriceTgt
, P.LastPrice CurrPrice
, I.AssetClass Target
, I.Industry Industry
, I.CategoryCode1 Risk
FROM
( SELECT
SUM(Quantity) TotalQuantity
, SUM(MarketValue) TotalMarketValue
, MAX(PriceLC) Price
, Symbol
FROM
HOLDINGS_SECURE
WHERE
Quantity > 0
GROUP BY
Symbol
) H
JOIN
INVESTMENTS I
ON H.Symbol = I.Symbol
JOIN
PRICE P
ON H.Symbol = P.Symbol
WHERE
I.Product = 'stock'
AND I.CategoryCode5 NOT IN ('X', '') ;
1. CROSS JOIN
, LEFT JOIN LATERAL
to subquery
SELECT a.user_id, COALESCE(b.balance, 0) AS balance, d.as_of_date
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN LATERAL (
SELECT balance
FROM balances
WHERE user_id = a.user_id
AND as_of_date <= d.as_of_date
ORDER BY as_of_date DESC
LIMIT 1
) b ON true
ORDER BY a.user_id, d.as_of_date;
Returns your desired result - except that as_of_date
is an actual date
, not a timestamp
like in your example. That should be more appropriate.
Users that are created already, but don't have any transactions, yet, are listed with a balance of 0. You did not define how to deal with the corner case.
Rather use timestamp
input for generate_series()
:
It's crucial for performance that you back this up with a multicolumn index:
CREATE INDEX balances_multi_idx ON balances (user_id, as_of_date DESC, balance);
We have had a very similar case on SO just this week:
Find more explanation there.
2. CROSS JOIN
, LEFT JOIN
, window functions
SELECT user_id
, COALESCE(max(balance) OVER (PARTITION BY user_id, grp
ORDER BY as_of_date), 0) AS balance
, as_of_date
FROM (
SELECT a.user_id, b.balance, d.as_of_date
, count(b.user_id) OVER (PARTITION BY user_id ORDER BY as_of_date) AS grp
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN balances b USING (user_id, as_of_date)
) sub
ORDER BY user_id, as_of_date;
Same result. If you have the multicolumn index mentioned above and can get index-only scans out of it, the first solution is most probably faster.
The main feature is the running count of values to form groups. since count() does not count NULL values, all dates without balance fall into the same group (grp
) as the most recent balance. Then use a simple max()
over the same window frame extended by grp
to copy the last balance for dangling gaps.
Related:
Best Answer
This is a Gaps and Islands question. See here for more details on problems like this.
This should do what you need:
UPDATE Andriy noted that my solution was a SQL Servre 2012+ solution. The following code should work for versions down to 2005.