Postgresql – Include NULL row on join

group byjoin;postgresql

How can I select all rows from table_A and then join on results from other tables, but include a NULL row to show aggregate values for those other tables that don't have a table_A_id?

WITH RECURSIVE transactions_with_children AS (
  SELECT
    table_A_id,
    other_stuff,
    1 AS depth
  FROM transactions
  WHERE transactions.parent_transaction_id IS NULL
  UNION ALL
  SELECT
    table_A_id,
    other_stuff,
    depth + 1 AS depth
  FROM transactions children
    INNER JOIN transactions_with_children parents ON children.parent_transaction_id = parents.id
)

SELECT 
  table_A.id,
  view_B.aggregate_col1, view_B.aggregate_col2,
  view_C.aggregate_col1, view_C.aggregate_col2

FROM table_A
  -- Limit table_A to only records with data from transactions_with_children, but I also need to include
  -- a null row for all transactions_with_children that don't have a table_A_id
  INNER JOIN transactions_with_children on transactions_with_children.table_A_id = table_A.id
  LEFT JOIN (SELECT
               t.table_A_id,
               aggregate_col1, aggregate_col2
             FROM transactions_with_children t
             GROUP BY t.table_A_id
            ) view_B
    ON table_A.id = view_B.table_A_id
  LEFT JOIN (SELECT
               t.table_A_id,
               aggregate_col1, aggregate_col2
             FROM transactions_with_children t
             GROUP BY t.table_A_id
            ) view_C
    ON table_A.id = view_C.table_A_id

The recursive WITH tables are all scoped records from a large table. Each of the LEFT JOINs that i'm doing using this table just aggregate data from it. SUM, COUNT etc. I then join on the common table_A_id that each of the records from the WITH may or may not have.

So the problem with doing a RIGHT JOIN on transactions_with_children (I think) is that I would get tons of duplicates from transactions_with_children table. I guess I could maybe create an RIGHT JOIN sub select where I am pulling records from transactions_with_children and grouping by.

something like:

RIGHT JOIN (
  SELECT t.map_account_affiliate_id
  FROM transactions_with_children t 
  GROUP BY t.map_account_affiliate_id
) ids ON map_account_affiliates.id = ids.map_account_affiliate_id

I assume the join select above would include a null field if transactions_with_children.map_account_affiliate_id had any null values right?

This seems like a hacky way to do it, but i'm not all that great with SQL either.

I tried just simplifying the query for the purposes of the question, but maybe its easier to look at the original.

WITH RECURSIVE transactions_with_children AS (
  SELECT
    transactions.id,
    transactions.transaction_type,
    transactions.successful,
    transactions.test,
    transactions.amount_cents,
    transactions.parent_transaction_id,
    orders.map_account_affiliate_id AS map_account_affiliate_id,
    orders.t1_affiliate_tracking_id AS t1_affiliate_tracking_id,
    orders.t2_affiliate_tracking_id AS t2_affiliate_tracking_id,
    orders.id                       AS order_id,
    orders.front_end                AS front_end,
    1                               AS depth
  FROM transactions
    INNER JOIN line_items ON line_items.id = transactions.line_item_id
    INNER JOIN orders ON orders.id = line_items.order_id AND orders.campaign_id IN (104)
  WHERE transactions.parent_transaction_id IS NULL
        AND transactions.successful = TRUE
        AND transactions.transaction_type IN (3, 1, 22, 5, 55, 7, 8, 22, 30, 31, 32, 33)
  UNION ALL
  SELECT
    children.id,
    children.transaction_type,
    children.successful,
    children.test,
    children.amount_cents,
    children.parent_transaction_id,
    orders.map_account_affiliate_id AS map_account_affiliate_id,
    orders.t1_affiliate_tracking_id AS t1_affiliate_tracking_id,
    orders.t2_affiliate_tracking_id AS t2_affiliate_tracking_id,
    orders.id                       AS order_id,
    orders.front_end                AS front_end,
    depth + 1                       AS depth
  FROM transactions children
    INNER JOIN transactions_with_children parents ON children.parent_transaction_id = parents.id
    INNER JOIN line_items ON line_items.id = children.line_item_id
    INNER JOIN orders ON orders.id = line_items.order_id
  WHERE children.successful = TRUE
        AND children.transaction_type IN (3, 1, 22, 5, 55, 7, 8, 22, 30, 31, 32, 33)

)

SELECT *
FROM transactions_with_children;

SELECT
  map_account_affiliates.id                                                             AS resource_id,
  map_account_affiliates.display_name                                                   AS resource_name,
  'affiliate'                                                                           AS resource_type,
  ''                                                                                    AS parent_resource_type,
  ''                                                                                    AS parent_resource_id,
  COALESCE(sales.num_sales, 0)
                                                                                        AS num_sales,
  COALESCE(sales.num_scheduled_sales, 0)
                                                                                        AS num_scheduled_sales,
  COALESCE(sales.num_fe_sales, 0)
                                                                                        AS num_fe_sales,
  COALESCE(sales.gross_sales, 0)
                                                                                        AS total_gross_sales,
  COALESCE(sales.gross_scheduled_sales, 0)
                                                                                        AS total_gross_scheduled_sales,
  COALESCE(sales.gross_sales :: FLOAT / nullif(sales.num_fe_sales :: FLOAT, 0)
  , 0)
                                                                                        AS avg_gross_order_value,
  (
    COALESCE(sales.num_fe_sales / nullif(((SUM(sales.num_fe_sales)
    OVER ()) :: FLOAT), 0)
    , 0)
    * 100.0
  )                                                                                     AS percent_all_fe_sales,
  COALESCE((sales.gross_fe_sales :: FLOAT / nullif(sales.num_fe_sales :: FLOAT, 0)), 0) AS avg_fe_sale_amt,
  COALESCE(refunds.num_full_refunds, 0)
                                                                                        AS num_full_refunds,
  COALESCE(refunds.num_part_refunds, 0)
                                                                                        AS num_part_refunds,
  COALESCE(refunds.total_refunds, 0)
                                                                                        AS total_refunds,
  COALESCE(refunds.total_refunds :: FLOAT / nullif(sales.gross_sales, 0)
  , 0)
                                                                                        AS percent_refund_amount,
  (COALESCE(refunds.num_part_refunds, 0)
   +
   COALESCE(refunds.num_full_refunds, 0)
  ) :: FLOAT / nullif(sales.num_sales, 0) * 100.0                                       AS percent_refunds,
  COALESCE(chargebacks.num_chargebacks, 0)
                                                                                        AS num_chargebacks,
  COALESCE(chargebacks.num_chargebacks_won, 0)
                                                                                        AS num_chargebacks_won,
  COALESCE(chargebacks.total_chargebacks, 0)
                                                                                        AS total_chargebacks,
  COALESCE(chargebacks.total_chargebacks :: FLOAT / nullif(sales.gross_sales, 0)
  , 0)
  * 100.0                                                                               AS percent_chargeback_amount,
  (chargebacks.num_chargebacks) :: FLOAT / nullif(sales.num_sales, 0) * 100.0           AS percent_chargebacks,
  (
    COALESCE(sales.gross_sales, 0) -
    COALESCE(refunds.total_refunds, 0) -
    COALESCE(chargebacks.total_chargebacks, 0)
  )                                                                                     AS total_net_sales,
  (
    (
      COALESCE(sales.gross_sales, 0) -
      COALESCE(refunds.total_refunds, 0) -
      COALESCE(chargebacks.total_chargebacks, 0)
    ) / nullif(sales.num_fe_sales, 0)

  )                                                                                     AS avg_net_order_value

FROM map_account_affiliates

  LEFT JOIN (SELECT
               t.map_account_affiliate_id               AS map_account_affiliate_id,
               COUNT(DISTINCT t.order_id)               AS num_sales,
               COUNT(t.id)
                 FILTER (WHERE t.transaction_type = 22) AS num_scheduled_sales,
               SUM(t.amount_cents)
                 FILTER (WHERE t.transaction_type = 22) AS gross_scheduled_sales,
               SUM(t.amount_cents)                      AS gross_sales,
               COUNT(DISTINCT t.order_id)
                 FILTER (WHERE t.front_end = TRUE)      AS num_fe_sales,
               SUM(t.amount_cents)
                 FILTER (WHERE t.front_end = TRUE)      AS gross_fe_sales
             FROM transactions_with_children t
             WHERE t.transaction_type IN (3, 1, 22)
             GROUP BY t.map_account_affiliate_id
            ) sales
    ON map_account_affiliates.id = sales.map_account_affiliate_id
  LEFT JOIN (SELECT
               t.map_account_affiliate_id               AS map_account_affiliate_id,
               COUNT(t.id)
                 FILTER (WHERE t.transaction_type = 5)  AS num_full_refunds,
               COUNT(t.id)
                 FILTER (WHERE t.transaction_type = 55) AS num_part_refunds,
               SUM(t.amount_cents)                      AS total_refunds
             FROM transactions_with_children t
             WHERE t.transaction_type IN (5, 55)
             GROUP BY t.map_account_affiliate_id
            ) refunds
    ON map_account_affiliates.id = refunds.map_account_affiliate_id
  LEFT JOIN (SELECT
               t.map_account_affiliate_id              AS map_account_affiliate_id,
               COUNT(t.id)
                 FILTER (WHERE t.transaction_type = 7) AS num_chargebacks,
               COUNT(t.id)
                 FILTER (WHERE t.transaction_type = 8) AS num_chargebacks_won,
               (
                 COALESCE(SUM(t.amount_cents)
                            FILTER (WHERE t.transaction_type = 7), 0)
                 - COALESCE(SUM(t.amount_cents)
                              FILTER (WHERE t.transaction_type = 8), 0)
               )                                       AS total_chargebacks
             FROM transactions_with_children t
             WHERE t.transaction_type IN (7, 8)
             GROUP BY t.map_account_affiliate_id
            ) chargebacks
    ON map_account_affiliates.id = chargebacks.map_account_affiliate_id

WHERE map_account_affiliates.account_id = 1

      AND (num_sales > 0 OR num_fe_sales > 0 OR num_scheduled_sales > 0 OR num_full_refunds > 0
           OR num_part_refunds > 0 OR num_chargebacks > 0)

ORDER BY resource_name

I know this is a long messy query, but hopefully my explanation makes enough sense to get my question across.

Best Answer

Use RIGHT JOIN instead of INNER JOIN right after your comment in the code:

Limit table_A to only records with data from transactions_with_children, but I also need to include a null row for all transactions_with_children that don't have a table_A_id

RIGHT [OUTER] JOIN keeps all rows from tables to the right of the join and adds NULL values for rows from the tables left of the join where none match the join condition. Your text is a bit ambiguous, you might want LEFT JOIN instead, which is the same with reversed sides.

Basics in the manual here.