T-sql – Joining Three Tables Using Composite Key

join;t-sql

I know there are a few posts on this topic, but can't seem to find one that is similar to mine, so hopefully someone can provide some insight!!

I have three tables: acct_rec, income_rec, and advance_billing. These tables house invoice data. acct_rec is the main table that has every invoice and some data around it. income_rec houses only invoices where we received income. advance_billing houses invoices that have been billed to the client.

So the tricky part is that sometimes an invoice will only exist in either income_rec or advance_billing. Sometimes an invoice will be in both tables.

The columns they all have in common are ar_inv_nbr, ar_seq_nbr, and ar_type. income_rec and acct_rec have two additonal fields that need to be joined on: job_number and job_comp_number.

These are the columns of interest I'm looking to retrieve:

acct_rec
inv_amount
inv_advance_amt
inv_date
post_period
ar_inv_nbr
ar_inv_seq
ar_type

income_rec
ar_inv_nbr
ar_inv_seq
ar_type
rec_amt

advance_billing
hours
quantity
line_total
ar_inv_nbr
ar_inv_seq
ar_type

So for example, if an invoice is only in income_rec I'd like to retrieve that row from income_rec along with the data for that invoice from acct_Rec. There should then be NULL's for all the fields that are in advance_billing.

And if an invoice is in advance_billing but not in income_rec, I'd like to accomplish the same: return all data from advance_billing but NULL's for income rec data.

If an invoice is in both tables, then return the data from both tables. if there are multiple rows for the invoice in the tables (which there can be) group all rows to create one row.

I'm including the DDL for the tables below:

CREATE TABLE [dbo].[ADVANCE_BILLING](
[AB_ID] [int] NOT NULL,
[SEQ_NBR] [smallint] NOT NULL,
[AB_FLAG] [smallint] NOT NULL,
[JOB_NUMBER] [int] NOT NULL,
[JOB_COMPONENT_NBR] [smallint] NOT NULL,
[FNC_CODE] [varchar](6) NOT NULL,
[FNC_TYPE] [varchar](1) NULL,
[ADV_BILL_NET_AMT] [decimal](14, 2) NULL,
[COMMISSION_PCT] [decimal](9, 3) NULL,
[AR_INV_NBR] [int] NULL,
[AR_INV_SEQ] [smallint] NULL,
[AR_TYPE] [varchar](3) NULL,
[AR_INV_VOID] [smallint] NULL,
[CREATE_DATE] [datetime] NULL DEFAULT (getdate()),
[LINE_TOTAL [decimal](14,2)

CREATE TABLE [dbo].[ACCT_REC](
[AR_INV_NBR] [int] NOT NULL,
[AR_INV_SEQ] [smallint] NOT NULL,
[AR_TYPE] [varchar](3) NOT NULL,
[CL_CODE] [varchar](6) NOT NULL,
[DIV_CODE] [varchar](6) NULL,
[PRD_CODE] [varchar](6) NULL,
[OFFICE_CODE] [varchar](4) NULL,
[AR_INV_DATE] [datetime] NULL,
[AR_DESCRIPTION] [varchar](40) NULL,
[AR_INV_AMOUNT] [decimal](15, 2) NULL,
[AR_POST_PERIOD] [varchar](6) NULL,
[AR_COS_AMT] [decimal](14, 2) NULL,
[AR_STATE_AMT] [decimal](14, 2) NULL,
[AR_COUNTY_AMT] [decimal](14, 2) NULL,
[AR_CITY_AMT] [decimal](14, 2) NULL,
[AR_ADVANCE_AMT] [decimal](15, 2) NULL

CREATE TABLE [dbo].[INCOME_REC](
[AB_ID] [int] NOT NULL,
[SEQ_NBR] [smallint] NOT NULL,
[JOB_NUMBER] [int] NOT NULL,
[JOB_COMPONENT_NBR] [smallint] NOT NULL,
[AR_INV_NBR] [int] NULL,
[AR_INV_SEQ] [smallint] NULL,
[AR_TYPE] [varchar](3) NULL,
[AB_FLAG] [smallint] NULL,
[CREATE_DATE] [datetime] NULL
[INCOME_REC] ADD [REC_AMT] [decimal](14, 2) NULL

I hope this wasn't too long winded, I just wanted to provide as much detail as possible with regards to my question!

If anyone has any insight, that would be awesome! I've tried a billion different join combinations and cannot get this to provide the right results!!

Thanks!!

Best Answer

If we were dealing with 0-1 income_rec rows and 0-1 advance_billing rows for any given acct_rec row, and if AR_INV_NBR, AR_INV_SEQ, and AR_TYPE uniquely identify a single acct_rec row, then this is trivial:

SELECT ...
  FROM acct_rec ar
         LEFT OUTER JOIN income_rec inc ON (    ar.AR_INV_NBR        = inc.AR_INV_NBR
                                            AND ar.AR_INV_SEQ        = inc.AR_INV_SEQ
                                            AND ar.AR_TYPE           = inc.AR_TYPE
                                            AND ar.JOB_NUMBER        = inc.JOB_NUMBER
                                            AND ar.JOB_COMPONENT_NBR = inc.JOB_COMPONENT_NBR
                                           )
         LEFT OUTER JOIN advance_billing bill ON (    ar.AR_INV_NBR = bill.AR_INV_NBR
                                                  AND ar.AR_INV_SEQ = bill.AR_INV_SEQ
                                                  AND ar.AR_TYPE    = bill.AR_TYPE
                                                 )

However, we apparently have to deal with the possibility that we have multiple rows in each of the three tables with the same three key values (AR_INV_NBR, 'AR_INV_SEQ, andAR_TYPE`). This would allow cross-products amongst the various tables to come in, which is probably the issue you've had.

Taking a relatively simple example, where there is 1 row in acct_recwith our three key values, 3 rows in income_rec, and 2 rows in advance_billing. The LEFT JOIN approach above would give us six rows:

  • ar + inc(1) + bill(1)
  • ar + inc(1) + bill(2)
  • ar + inc(2) + bill(1)
  • ar + inc(2) + bill(2)
  • ar + inc(3) + bill(1)
  • ar + inc(3) + bill(2)

Aggregates would result in twice the expected values for the income_rec columns, and 3 times the expected values for the advance_billing columns.

So, we need a solution where only one of the JOINed tables is allowed to have multiple rows.

The script below:

  • creates a temp table with one unique row for each of the AR_INV_NBR-AR_INV_SEQ-AR_TYPE combinations seen in acct_rec;
  • generates the actual results by:
    • creating a sub-query to match our unique account records with all matching records from advance_billing, aggregating the results so we have one row for each acct_rec row;
    • take our unique account temp table, and LEFT JOIN it with income_rec, and with our aggregated advance_billing sub-query;
    • and aggregate the income_rec values, to handle multiple rows from that table (by this point, we've combined any multiple rows from acct_rec and advance_billing into a single row).

Here's the query:

CREATE TABLE #ar 
     (
       [AR_INV_NBR] [int] NOT NULL
                ,[AR_INV_SEQ] [smallint] NOT NULL
                ,[AR_TYPE] [varchar](3) NOT NULL
                ,[AR_INV_DATE] [datetime] NULL
                ,[AR_INV_AMOUNT] [decimal](15, 2) NULL
                ,[AR_POST_PERIOD] [varchar](6) NULL
                ,[AR_ADVANCE_AMT] [decimal](15, 2) NULL
     )
;

-- populate a temp table with 1 unique row
--   for each AR_INV_NBR, AR_INV_SEQ, AR_TYPE in acct_rec
INSERT INTO #ar (AR_INV_NBR, AR_INV_SEQ, AR_TYPE, AR_INV_DATE, AR_INV_AMOUNT, AR_ADVANCE_AMT)
SELECT AR_INV_NBR, AR_INV_SEQ, AR_TYPE
      ,MAX(AR_INV_DATE)
      ,SUM(AR_INV_AMOUNT)
      ,SUM(AR_ADVANCE_AMT)
  FROM acct_rec
 GROUP BY AR_INV_NBR, AR_INV_SEQ, AR_TYPE
;

-- Update temp table with the maximum post period from the acct_rec row(s)
--   with the maximum date for each AR_INV_NBR, AR_INV_SEQ, AR_TYPE
-- Don't want to jsut take MAX of all post periods, in case we've got rows
--   that fall before and after the restart of the post period count
--   (something that could possibly happen at year-end, quarter-end, daily, or never).
UPDATE ar
   SET AR_POST_PERIOD = pp.AR_POST_PERIOD
  FROM #ar ar
         INNER JOIN (SELECT t.AR_INV_NBR, t.AR_INV_SEQ, t.AR_TYPE, t.AR_INV_DATE
                           ,MAX(act.AR_POST_PERIOD) as AR_POST_PERIOD
                       FROM #ar t
                              INNER JOIN acct_rec act ON (    t.AR_INV_NBR  = act.AR_INV_NBR
                                                          AND t.AR_INV_SEQ  = act.AR_INV_SEQ
                                                          AND t.AR_TYPE     = act.AR_TYPE
                                                          AND t.AR_INV_DATE = act.AR_INV_DATE
                                                         )
                    ) pp ON (    ar.AR_INV_NBR  = pp.AR_INV_NBR
                             AND ar.AR_INV_SEQ  = pp.AR_INV_SEQ
                             AND ar.AR_TYPE     = pp.AR_TYPE
                            )
;

-- Now that we know we have one unique row for each set of values in acct_rec,
--   we can bring in the values from the other tables.

SELECT ar.AR_INV_NBR, ar.AR_INV_SEQ, ar.AR_TYPE
      ,ar.AR_INV_AMOUNT, ar.AR_ADVANCE_AMT, ar.AR_INV_DATE, ar.AR_POST_PERIOD
      ,SUM(inc.REC_AMT) as REC_AMT
      ,bill.HOURS, bill.QUANTITY, bill.LINE_TOTAL
  FROM #ar ar
         LEFT  JOIN INCOME_REC ir ON (    ar.AR_INV_NBR = inc.AR_INV_NBR
                                      AND ar.AR_INV_SEQ = inc.AR_INV_SEQ
                                      AND ar.AR_TYPE    = inc.AR_TYPE
                                      AND EXISTS (SELECT 1 FROM acct_rec 
                                                   WHERE AR_INV_NBR = inc.AR_INV_NBR
                                                     AND AR_INV_SEQ = inc.AR_INV_SEQ
                                                     AND AR_TYPE = inc.AR_TYPE
                                                     -- Drop the following two lines if they won't actually eliminate anything
                                                     AND JOB_NUMBER = inc.JOB_NUMBER
                                                     AND JOB_COMPONENT_NBR = inc.JOB_COMPONENT_NBR
                                                 )
                                     )
         LEFT  JOIN (SELECT ar1.AR_INV_NBR, ar1.AR_INV_SEQ, ar1.AR_TYPE
                           ,SUM(bill_1.HOURS) as HOURS
                           ,SUM(bill_1.QUANTITY) as QUANTITY
                           ,SUM(bill_1.LINE_TOTAL) as LINE_TOTAL
                       FROM #ar ar1
                              INNER JOIN ADVANCE_BILLING bill_1 ON (    ar1.AR_INV_NBR = bill_1.AR_INV_NBR
                                                                    AND ar1.AR_INV_NBR = bill_1.AR_INV_NBR
                                                                    AND ar1.AR_TYPE    = bill_1.AR_TYPE
                                                                   )
                      GROUP BY ar1.AR_INV_NBR, ar1.AR_INV_SEQ, ar1.AR_TYPE
                    ) bill ON (    ar.AR_INV_NBR = bill.AR_INV_NBR
                               AND ar.AR_INV_NBR = bill.AR_INV_NBR
                               AND ar.AR_TYPE    = bill1.AR_TYPE
                              )
 GROUP BY
       ar.AR_INV_NBR, ar.AR_INV_SEQ, ar.AR_TYPE
      ,ar.AR_INV_AMOUNT, ar.AR_ADVANCE_AMT, ar.AR_INV_DATE, ar.AR_POST_PERIOD
      ,bill.HOURS, bill.QUANTITY, bill.LINE_TOTAL
;

NOTE: This query may be overly complicated, because I've made assumptions to try to make it cover the worst-case scenarios. If AR_INV_NBR, AR_INV_SEQ, AR_TYPE act as a unique key on the acct_rec table, then the #ar temp table can be dropped. If the stated JOIN using JOB_NUMBER and JOB_COMPONENT_NBR between acct_rec and income_rec is not needed (for example, if there's no chance that we'd find an income_rec row where the two values don't match up to any rows in acct_rec), then the EXISTS clause in the income_rec JOIN can be dropped.

ALSO NOTE: by the same token, the code does assume that JOB_NUMBER and JOB_COMPONENT_NBR are actually in acct_rec (the DDL posted doesn't include them), and those two columns in advance_billing should be ignored. If either assumption is incorrect, the query could change significantly.