You'll need to first create a list of every product_number
and date
combination. You can do this using a CROSS JOIN
of your table:
select distinct p.product_number, d.date
from yourtable p
cross join yourtable d;
See SQL Fiddle with Demo. This will create a list of data similar to:
| PRODUCT_NUMBER | DATE |
|----------------|---------------------------------|
| 100 | January, 01 2010 00:00:00+0000 |
| 200 | January, 01 2010 00:00:00+0000 |
| 100 | February, 01 2010 00:00:00+0000 |
You will then use the above query and LEFT JOIN
to your table to return the final result:
select
pd.product_number,
pd.date,
t.value
from
(
-- list of every product/date
select distinct p.product_number, d.date
from yourtable p
cross join yourtable d
) pd
left join yourtable t
on pd.date = t.date
and pd.product_number = t.product_number
order by pd.product_number, pd.date;
See SQL Fiddle with Demo. Giving a final result of:
| PRODUCT_NUMBER | DATE | VALUE |
|----------------|---------------------------------|--------|
| 100 | January, 01 2010 00:00:00+0000 | 1 |
| 100 | February, 01 2010 00:00:00+0000 | 1 |
| 100 | March, 01 2010 00:00:00+0000 | 1 |
| 200 | January, 01 2010 00:00:00+0000 | 1 |
| 200 | February, 01 2010 00:00:00+0000 | (null) |
| 200 | March, 01 2010 00:00:00+0000 | 1 |
The LEFT JOIN
returns all rows from your list of products and dates regardless of whether a matching row exists in the other table.
This could also be written as:
select
p.product_number,
d.date,
t.value
from
(
-- list of every product
select distinct product_number
from yourtable
) p
cross join
(
-- list of every date
select distinct date
from yourtable
) d
-- then join to the table
left join yourtable t
on d.date = t.date
and p.product_number = t.product_number
order by p.product_number, d.date ;
See SQL Fiddle with Demo. This may have better performance depending on your table size.
Now if you wanted to return a list of all dates, regardless of whether or not they appear in the table, then I would suggest creating a table of dates. This table would be used in a similar manner to create a list of all dates/products which you would then join.
The table would be similar to:
CREATE TABLE dates
(`date` datetime)
;
INSERT INTO dates
(`date`)
VALUES
('2010-01-01 00:00:00'),
('2010-02-01 00:00:00'),
('2010-03-01 00:00:00'),
('2010-04-01 00:00:00'),
('2010-05-01 00:00:00')
;
You'd then use the following query to get the list of dates/products:
select distinct p.product_number, d.date
from yourtable p
cross join dates d
And finally, you would join that back to your table:
select
pd.product_number,
pd.date,
t.value
from
(
-- list of every product/date
select distinct p.product_number, d.date
from yourtable p
cross join dates d
) pd
left join yourtable t
on pd.date = t.date
and pd.product_number = t.product_number
order by pd.product_number, pd.date;
See SQL Fiddle with Demo. Or an alternative:
select
p.product_number,
d.date,
t.value
from
(
-- list of every product
select distinct product_number
from yourtable
) p
cross join
dates
d
-- then join to the table
left join yourtable t
on d.date = t.date
and p.product_number = t.product_number
order by p.product_number, d.date ;
See SQL Fiddle with Demo. Again this may have better performance based on the table size. Using this type of solution, you'd return all dates even those not in your table:
| PRODUCT_NUMBER | DATE | VALUE |
|----------------|---------------------------------|--------|
| 100 | January, 01 2010 00:00:00+0000 | 1 |
| 100 | February, 01 2010 00:00:00+0000 | 1 |
| 100 | March, 01 2010 00:00:00+0000 | 1 |
| 100 | April, 01 2010 00:00:00+0000 | (null) |
| 100 | May, 01 2010 00:00:00+0000 | (null) |
| 200 | January, 01 2010 00:00:00+0000 | 1 |
| 200 | February, 01 2010 00:00:00+0000 | (null) |
| 200 | March, 01 2010 00:00:00+0000 | 1 |
| 200 | April, 01 2010 00:00:00+0000 | (null) |
| 200 | May, 01 2010 00:00:00+0000 | (null) |
Best Answer
If you run the statement without the
where
clause you'll see why:The join on the "id" column works like this:
Take the first
hello
from the table and look for all rows that containhello
- that yield two rows for the first hello. The same happens with the second hello, so you wind up with 2x2 rows for the join on hello. And the same forworld
The outer join does not play any role, because there is a match for each id (actually: two matches).
You can never get your first (intended) result because that implies that all rows in the "a" table have lang='en' (which is of course not true).
To get the missing translations you need to first create the combination of all languages and ids:
Now you need to find all rows that are not in that result:
You can achieve this with an outer join as well. I simply prefer the
not exists
because it documents more clearly the intention (and because I hardly ever work with MySQL which is known to perform poorly with sub-queries like that)Here is an SQLFiddle: http://sqlfiddle.com/#!2/9804d/6
Edit
after testing the peformance with larger tables, it seems that Sean's version of the cross join is much more efficient than mine.
So this statement should be faster than the ones above:
Edit 2
And another version to be tested (SQL-Fiddle):