Mysql – How to simulate a table join on itself

join;MySQL

Let's assume this is my current data:

product_number  date         value
100             2010-01-01   1
100             2010-02-01   1
100             2010-03-01   1
200             2010-01-01   1
200             2010-03-01   1

How can I join the table on "itself", to fill the dates where other products have missing dates (compared to all the results, that change all the time)?

As you can see, product #200 doesn't have a 2010-02-01 date and I need to get a NULL value there.

Expected result:

product_number  date         value
100             2010-01-01   1
100             2010-02-01   1
100             2010-03-01   1
200             2010-01-01   1
200             2010-02-01   NULL # add NULL values where it doesn't have a date
200             2010-03-01   1

Basically: get a list of all distinct dates, find products where that date doesn't exist and simulate an entry with a NULL or empty value to fill the gap

Best Answer

You'll need to first create a list of every product_number and date combination. You can do this using a CROSS JOIN of your table:

select distinct p.product_number, d.date
from yourtable p
cross join yourtable d;

See SQL Fiddle with Demo. This will create a list of data similar to:

| PRODUCT_NUMBER |                            DATE |
|----------------|---------------------------------|
|            100 |  January, 01 2010 00:00:00+0000 |
|            200 |  January, 01 2010 00:00:00+0000 |
|            100 | February, 01 2010 00:00:00+0000 |

You will then use the above query and LEFT JOIN to your table to return the final result:

select 
  pd.product_number,
  pd.date,
  t.value
from
(
  -- list of every product/date
  select distinct p.product_number, d.date
  from yourtable p
  cross join yourtable d
) pd
left join yourtable t
  on pd.date = t.date
  and pd.product_number = t.product_number
order by pd.product_number, pd.date;

See SQL Fiddle with Demo. Giving a final result of:

| PRODUCT_NUMBER |                            DATE |  VALUE |
|----------------|---------------------------------|--------|
|            100 |  January, 01 2010 00:00:00+0000 |      1 |
|            100 | February, 01 2010 00:00:00+0000 |      1 |
|            100 |    March, 01 2010 00:00:00+0000 |      1 |
|            200 |  January, 01 2010 00:00:00+0000 |      1 |
|            200 | February, 01 2010 00:00:00+0000 | (null) |
|            200 |    March, 01 2010 00:00:00+0000 |      1 |

The LEFT JOIN returns all rows from your list of products and dates regardless of whether a matching row exists in the other table.

This could also be written as:

select 
  p.product_number,
  d.date,
  t.value
from
(
  -- list of every product
  select distinct product_number
  from yourtable 
) p  
cross join
( 
  -- list of every date
  select distinct date
  from yourtable 
) d
  -- then join to the table
left join yourtable t
  on  d.date = t.date
  and p.product_number = t.product_number
order by p.product_number, d.date ;

See SQL Fiddle with Demo. This may have better performance depending on your table size.

Now if you wanted to return a list of all dates, regardless of whether or not they appear in the table, then I would suggest creating a table of dates. This table would be used in a similar manner to create a list of all dates/products which you would then join.

The table would be similar to:

CREATE TABLE dates
    (`date` datetime)
;

INSERT INTO dates
    (`date`)
VALUES
    ('2010-01-01 00:00:00'),
    ('2010-02-01 00:00:00'),
    ('2010-03-01 00:00:00'),
    ('2010-04-01 00:00:00'),
    ('2010-05-01 00:00:00')
;

You'd then use the following query to get the list of dates/products:

select distinct p.product_number, d.date
from yourtable p
cross join dates d

And finally, you would join that back to your table:

select 
  pd.product_number,
  pd.date,
  t.value
from
(
  -- list of every product/date
  select distinct p.product_number, d.date
  from yourtable p
  cross join dates d
) pd
left join yourtable t
  on pd.date = t.date
  and pd.product_number = t.product_number
order by pd.product_number, pd.date;

See SQL Fiddle with Demo. Or an alternative:

select 
  p.product_number,
  d.date,
  t.value
from
(
  -- list of every product
  select distinct product_number
  from yourtable 
) p  
cross join
 dates
  d
  -- then join to the table
left join yourtable t
  on  d.date = t.date
  and p.product_number = t.product_number
order by p.product_number, d.date ;

See SQL Fiddle with Demo. Again this may have better performance based on the table size. Using this type of solution, you'd return all dates even those not in your table:

| PRODUCT_NUMBER |                            DATE |  VALUE |
|----------------|---------------------------------|--------|
|            100 |  January, 01 2010 00:00:00+0000 |      1 |
|            100 | February, 01 2010 00:00:00+0000 |      1 |
|            100 |    March, 01 2010 00:00:00+0000 |      1 |
|            100 |    April, 01 2010 00:00:00+0000 | (null) |
|            100 |      May, 01 2010 00:00:00+0000 | (null) |
|            200 |  January, 01 2010 00:00:00+0000 |      1 |
|            200 | February, 01 2010 00:00:00+0000 | (null) |
|            200 |    March, 01 2010 00:00:00+0000 |      1 |
|            200 |    April, 01 2010 00:00:00+0000 | (null) |
|            200 |      May, 01 2010 00:00:00+0000 | (null) |