I have 4 MySQL tables
1. companies - {id*, company}
2. employees - {id*, fname, lname, id_company*}
3. technologies - {id*, technology}
4. company_technologies - {id_technology*, id_company*}
Field is indexed if it has '*'.
Companies can use multiple technologies and have multiple employees.
What I'm trying to do is write a query that would select all employees and their companies that use ANY of given technologies. The problem is that I have duplicates when I try to do something like this:
SELECT employees.*, companies.* FROM companies LEFT JOIN employees on companies.id = employees.id_company LEFT JOIN company_technologies ON companies.id = company_technologies.id_company WHERE company_technologies.id_technology IN(1,2,3)
If, for example, a company is using technology 1 and 2 – it will appear twice along with all people.
If I add 'GROUP BY companies.id' condition at the end – I only have one employee for each company, whereas I need all of them.
Tables 1, 2 and 4 have several million records, all tables are InnoDB.
Best Answer
You can convert the
join
to a semi-join, using eitherIN
orEXISTS
:Another way would be to use
GROUP BY
orDISTINCT
in a subquery and then join:Once you have one or more queries working correctly, you can start worrying about efficiency. I would add 2 composite indexes on the junction table, one on
(id_technology,id_company)
and one on(id_company,id_technology)
(and remove the single indexes.)Then test all queries with various parameters and table sizes and check which indexes are used and what are the execution plans.