Postgresql – How to LEFT JOIN a date table onto each customer (GROUP BY) in a customer purchase table? Can a JOIN be done for dates after initial purchase

group byjoin;postgresql

I am stuck on implementing a left join on dates for each individual customer.

I have a table which stores customer purchases:

SELECT purchase_date, customer
FROM purchases

I would like to create a similar table however one that includes ALL dates and not just purchase dates, for example:

SELECT date, customer, purchase_made

Where purchase_made is a BOOLEAN column. I understand that I need to use a LEFT JOIN however I am stuck at making sure that EACH customer has a full set of dates. For example if no customers made a purchase on 2016-01-01, and there are 2000 customers in the database, then 2000 rows of 2016-01-01 should be added (with each row attributed to a customer).

So far I've tried:

WITH dates AS (
    SELECT GENERATE_SERIES('2016-01-01'::TIMESTAMP, '2016-01-15'::TIMESTAMP, '1 day'::INTERVAL) AS date
    ),

SELECT date, customer 
FROM purchases 
LEFT JOIN dates ON purchases.date=dates.date;

How to implement the BOOLEAN column?

Also is it possible to add the date rows only AFTER a customers first purchase? So for example only insert 2016-01-01 for customers who made a purchase prior to 2016-01-01.

Best Answer

To follow on from my comment. I'd recommend a date table.

Sample Customer Purchase Data

IF OBJECT_ID('tempdb..#CustomerPurchases') IS NOT NULL DROP TABLE #CustomerPurchases
GO
CREATE TABLE #CustomerPurchases (PurchaseDate date, CustomerName varchar(20))
INSERT INTO #CustomerPurchases (PurchaseDate, CustomerName)
VALUES
 ('2016-07-04','Jon Snow')
,('2016-07-06','Jon Snow')
,('2016-07-07','Jon Snow')
,('2016-07-07','Jon Snow')
,('2016-07-07','Jon Snow')
,('2016-07-05','Daenerys Targaryen')
,('2016-07-06','Daenerys Targaryen')
,('2016-07-09','Daenerys Targaryen')
,('2016-07-09','Daenerys Targaryen')
,('2016-07-10','Daenerys Targaryen')

Sample Date Table

IF OBJECT_ID('tempdb..#DateTable') IS NOT NULL DROP TABLE #DateTable
GO
CREATE TABLE #DateTable (DateList date)
INSERT INTO #DateTable (DateList)
VALUES
 ('2016-07-04')
,('2016-07-05')
,('2016-07-06')
,('2016-07-07')
,('2016-07-08')
,('2016-07-09')
,('2016-07-10')

You could cross join the list of dates to then get a full list of all customers and all dates. Then join to the actual sales data to return the Boolean value you're after.

SELECT
dt.DateList
,cu.CustomerName
,(CASE WHEN cp.PurchaseDate IS NULL THEN 0 ELSE 1 END) PurchaseMade
FROM #DateTable dt
CROSS JOIN  (
                SELECT DISTINCT 
                CustomerName 
                FROM #CustomerPurchases
            ) cu
LEFT JOIN   (
                SELECT DISTINCT 
                CustomerName
                ,PurchaseDate 
                FROM #CustomerPurchases
            ) cp
    ON dt.DateList = cp.PurchaseDate
    AND cu.CustomerName = cp.CustomerName

Results would look like this

DateList    CustomerName        PurchaseMade
2016-07-04  Daenerys Targaryen  0
2016-07-05  Daenerys Targaryen  1
2016-07-06  Daenerys Targaryen  1
2016-07-07  Daenerys Targaryen  0
2016-07-08  Daenerys Targaryen  0
2016-07-09  Daenerys Targaryen  1
2016-07-10  Daenerys Targaryen  1
2016-07-04  Jon Snow            1
2016-07-05  Jon Snow            0
2016-07-06  Jon Snow            1
2016-07-07  Jon Snow            1
2016-07-08  Jon Snow            0
2016-07-09  Jon Snow            0
2016-07-10  Jon Snow            0

If you were to do this then your date table would obviously be much wider than the week example I've done. You could then just use date parameters to restrict to the date range you're after. Something like this;

DECLARE @StartDate date; SET @StartDate = '2016-07-05'
DECLARE @EndDate date; SET @EndDate = '2016-07-08'
SELECT
dt.DateList
,cu.CustomerName
,(CASE WHEN cp.PurchaseDate IS NULL THEN 0 ELSE 1 END) PurchaseMade
FROM #DateTable dt
CROSS JOIN  (
                SELECT DISTINCT 
                CustomerName 
                FROM #CustomerPurchases
            ) cu
LEFT JOIN   (
                SELECT DISTINCT 
                CustomerName
                ,PurchaseDate 
                FROM #CustomerPurchases
            ) cp
    ON dt.DateList = cp.PurchaseDate
    AND cu.CustomerName = cp.CustomerName
WHERE dt.DateList BETWEEN @StartDate AND @EndDate
ORDER BY CustomerName, DateList

Which would give these results

DateList    CustomerName        PurchaseMade
2016-07-05  Daenerys Targaryen  1
2016-07-06  Daenerys Targaryen  1
2016-07-07  Daenerys Targaryen  0
2016-07-08  Daenerys Targaryen  0
2016-07-05  Jon Snow            0
2016-07-06  Jon Snow            1
2016-07-07  Jon Snow            1
2016-07-08  Jon Snow            0