PostgreSQL Pivot – Subtracting Sets Within a Grouped Column

pivotpostgresql

I'm not the best at explaining, but I have a table that has the format

CREATE TABLE foo
AS
  SELECT type,date::date,tp,price
  FROM ( VALUES
    ( 'A', '2010-10-01', 1, 0.05 ),
    ( 'A', '2010-10-01', 2, 1.04 ),
    ( 'B', '2010-10-01', 1, 0.53 ),
    ( 'B', '2010-10-01', 2, 1.04 ),
    ( 'C', '2010-10-01', 1, 0.05 ),
    ( 'C', '2010-10-01', 2, 1.02 ),
    ( 'D', '2010-10-01', 1, 0.05 ),
    ( 'D', '2010-10-01', 2, 1.08 )
  ) AS t(type,date,tp,price);

And what I want to do is subtract different types where the date and tp are the same. So that would be A-B, A-C, A-D, B-A, B-C, B-D, C-A, C-B, C-D, D-A, D-B, D-C.

To me this seems like I would want a wide format table, with the columns date, tp, A, B, C, D and then do a column-wise subtraction based on possible combinations. If the subtraction is less than 0, then the value is 0

The desired output looks something like this:

combo | date      | tp | price
---+------------+----+-------
A_B  | 2010-10-01 | 1  | 0
A_B  | 2010-10-01 | 2  | 0
A_C  | 2010-10-01 | 1  | 0
A_C  | 2010-10-01 | 2  | 0.02
A_D  | 2010-10-01 | 1  | 0
A_D  | 2010-10-01 | 2  | 0
and so on for all the combinations

Should I be looking at using crosstab ? Or is there a simpler/more elegant solution? My current solution is a view that uses CTE(s) to create all the possible dates and type combos, and then I have a function that goes through every typeA and typeB, date, tp combo. It is very slow.

The initial table in question is 2857658 rows

Best Answer

Simple self-join

What you want is something like this...

SELECT
  ARRAY[f1.type,f2.type] AS type,
  date,
  tp,
  greatest(f1.price-f2.price, '0.00') AS price
FROM foo AS f1
INNER JOIN foo AS f2
  USING (date, tp)
WHERE f1.type <> f2.type
ORDER BY f1.type, f2.type, tp;

I deviated a bit from what you desired. Generally, you don't want string concatenation like that. It's less useful. You're better off using an array. If you do want string concatenation just use f1.type || '_' || f2.type.

 type  |    date    | tp | price 
-------+------------+----+-------
 {A,B} | 2010-10-01 |  1 |  0.00
 {A,B} | 2010-10-01 |  2 |  0.00
 {A,C} | 2010-10-01 |  1 |  0.00
 {A,C} | 2010-10-01 |  2 |  0.02
 {A,D} | 2010-10-01 |  1 |  0.00
 {A,D} | 2010-10-01 |  2 |  0.00
 ...
 (24 rows)

Crosstab

Cross-tab does something totally different. It only ever changes the display.

SELECT *
FROM crosstab($$
  SELECT
    ARRAY[f1.type,f2.type] AS type,
    date,
    tp,
    greatest(f1.price-f2.price, '0.00') AS price
  FROM foo AS f1
  INNER JOIN foo AS f2
    USING (date, tp)
  WHERE f1.type <> f2.type
  ORDER BY f1.type, f2.type, tp;
$$, $$VALUES (1),(2)$$
) AS t(type text[],"date" date,tp1 numeric,tp2 numeric);

Produces...

 type  |    date    | tp1  | tp2  
-------+------------+------+------
 {A,B} | 2010-10-01 | 0.00 | 0.00
 {A,C} | 2010-10-01 | 0.00 | 0.02
 {A,D} | 2010-10-01 | 0.00 | 0.00
 {B,A} | 2010-10-01 | 0.48 | 0.00
 {B,C} | 2010-10-01 | 0.48 | 0.02
 {B,D} | 2010-10-01 | 0.48 | 0.00
 {C,A} | 2010-10-01 | 0.00 | 0.00
 {C,B} | 2010-10-01 | 0.00 | 0.00
 {C,D} | 2010-10-01 | 0.00 | 0.00
 {D,A} | 2010-10-01 | 0.00 | 0.04
 {D,B} | 2010-10-01 | 0.00 | 0.04
 {D,C} | 2010-10-01 | 0.00 | 0.06
(12 rows)

12 rows rather than 24. We pivoted and put tp1, and tp2 together. Pick your poison -- what kind of display do you want? I always prefer not using cross tab unless someone requires it.

Performance questions

My current solution is a view that uses CTE(s) to create all the possible dates and type combos, and then I have a function that goes through every typeA and typeB, date, tp combo. It is very slow.

We would need to actually see the views, and to an output of EXPLAIN ANALYZE to know why it runs slow. Try the query I suggested at the top and see if it speeds things up.