If you run the statement without the where
clause you'll see why:
lang | id | lang | id
-----+-------+------+------
en | hello | en | hello
en | hello | fr | hello
fr | hello | en | hello
fr | hello | fr | hello
en | world | en | world
en | world | de | world
de | world | en | world
de | world | de | world
The join on the "id" column works like this:
Take the first hello
from the table and look for all rows that contain hello
- that yield two rows for the first hello. The same happens with the second hello, so you wind up with 2x2 rows for the join on hello. And the same for world
The outer join does not play any role, because there is a match for each id (actually: two matches).
You can never get your first (intended) result because that implies that all rows in the "a" table have lang='en' (which is of course not true).
To get the missing translations you need to first create the combination of all languages and ids:
select distinct a.id, b.lang
from trans a
cross join trans b;
Now you need to find all rows that are not in that result:
select *
from (
select distinct a.id, b.lang
from trans a
cross join trans b
) ac
where not exists (select 1
from trans mt
where mt.id = ac.id
and mt.lang = ac.lang);
You can achieve this with an outer join as well. I simply prefer the not exists
because it documents more clearly the intention (and because I hardly ever work with MySQL which is known to perform poorly with sub-queries like that)
select ac.*
from (
select distinct a.id, b.lang
from trans a
cross join trans b
) ac
left join trans mt on mt.id = ac.id and mt.lang = ac.lang
where mt.id is null;
Here is an SQLFiddle: http://sqlfiddle.com/#!2/9804d/6
Edit
after testing the peformance with larger tables, it seems that Sean's version of the cross join is much more efficient than mine.
So this statement should be faster than the ones above:
select at.*
from (
select lang_code, label_code
from (
SELECT distinct lang_code
FROM translations
) as translang
cross join (
SELECT distinct label_code
FROM translations
) as transid
) at
left join translations mt
on mt.lang_code = at.lang_code
and mt.label_code = at.label_code
where mt.lang_code is null;
Edit 2
And another version to be tested (SQL-Fiddle):
SELECT a.lang, a.id,
l.lang AS blang
FROM trans a
CROSS JOIN
( SELECT DISTINCT lang
FROM trans
) l
LEFT JOIN trans b
ON b.id = a.id
AND b.lang = l.lang
WHERE a.lang = 'en'
AND b.id IS NULL ;
How to download only the newer and updated entries, and avoid
downloading same entry once again?
Use a database synchronization tool like SymmetricDS or Daffodil Replicator.
You'd have a central database server, like PostgreSQL, with SymmetricDS running on it.
Each client, such as an Android smartphone, runs SQLite and SymmetricDS.
The SymmetricDS clients and server communicate with each other over HTTP(S), sending changesets back and forth.
How to provide edit entry possibility to db? Is it better to create
new table with proposal entries and after verification, move them to
main table? Or maybe should I create additional columns, which would
be proposals for new values?
I would add an is_approved
or similar column that needs to be true for items to show up to regular users. Editor and admins can see unapproved items.
What if I would get lots of changes proposals? Is it better to store
only the last one? Or maybe block edit possibility if a proposal is
already set?
Use History Tables (copy-on-write to a table similar to your main tables) to capture each change. That way you can rollback to earlier or better versions if you need to. Use Optimistic Concurrency Control to prevent updates if data changed.
How to avoid cheaters and scammers? Some bored people may want to try
make a mess, editing multiple entries with fake data. How to avoid
such kind of events? Grant edit access only to verified people?
Like this site, add "user abilities" after users have proven themselves. Block bad users. Use machine learning to filter spam.
If I would like to store images of items, is it better to store it in
db or use some kind of flicker or instagram instead?
Usually database is better, if you can cache heavily on web server, and can do incremental backup.
Is it better to save new entry in SQLite instant after edit? Or just
post it to MySQL and add it to SQLite only after verification and only
via SQLite and MySQL sync?
Saving to SQLite locally first allows for offline usage, and is probably more responsive. http://offlinefirst.org/
Is it good idea to set key value to EAN code? At least it is constant
and connected to only one product. There will be no items containing
no EAN.
No, I would use a meaningless key for product_id and a unique nullable column for EAN.
Best Answer
I ran your query without aliases
I ran your query as you gave it, with aliases
I ran your query with aliases in the
SELECT
clause as well as theFROM
clauseI just ran these in MySQL 5.6.15 on my Windows 8 Laptop. It works fine with and without aliases. It might be SQL Fiddle that has the problem in this instance with a MySQL Cartesian Product.