Mysql – Which query is optimized

mysql-5.7optimization

I've written these two queries for same operation, on the basis of time complexity, i want to know which one is efficient.

select Fname, Lname, Address 
from (select * from department d, employee e where d.Dnumber = e.Dno) as a 
where a.Dname = 'Research';

Edit: on the basis of my first comment i assume that the query in from clause will work as temporary table/view which i name as "a", and contain all the columns of both the tables and i use it.(and i don't know it's a efficient way to do it.)

or

select Fname, Lname, Address from employee 
where Dno = (select Dnumber from department where Dname = 'Research');

Or there could be more efficient way to do it.
Thanks.

Edit2:

SELECT Pnumber, Dnum, Lname, Address, Bdate 
FROM employee 
JOIN department d ON d.Mgr_ssn = employee.Ssn 
JOIN project p ON p.Dnum = d.Dnumber 
WHERE p.Plocation = 'Stafford'; 

i've this argument, can you tell me what's the flaw in this.,

let's there be 1000 rows in each PROJECT, DEPARTMENT, EMPLOYEE table,

then in the above query, compiler will choose in what order the join has to be applied (ABC, ACB,BCA, BAC,…) choosing the efficient one. But to select the best of all for the first, then second,… it has to perform all and compare it with each other, plus after that the order of the tuples involved would be same(not sure, maybe if it goes for EMPLOYEE, DEPARTMENT first it only have 100-150 rows, that entirely depends upon tables…right???). I never worked on live project situation, maybe there are advantages of JOINS(which there are, that why it's in advance release) which i'm not able to see. But i'm not convinced.

then in the below query let 250 comes out to be success, which then CARTESIAN PRODUCT with DEPARTMENT table and produce 2,50,000 tuples from which let's say 5,000 be RESULT, which then again CARTESIAN PRODUCT with 1000 tuples to produce 500,000.

SELECT Pnumber, Dnum, Lname, Address, Bdate 
FROM (SELECT Pnumber, Dnum, Mgr_ssn 
      FROM department d, (SELECT Pnumber, Dnum 
                          FROM project where Plocation = 'Stafford') p
      WHERE d.Dnumber = p.Dnum) q, employee e  
WHERE q.Mgr_ssn = e.Ssn;

So, my question is JOIN is to write query simply, and let compiler decides what's the efficient order. Whereas in the below one we've done the almost work for compiler.

and, i've one more question, WHERE clause applies only on one relation?, or let me rephrase, first FROM(along with all the JOINS) clause execute then where clause execute coz in that case the query with JOINS will be very expensive.

Thanks.

Best Answer

You could also write

SELECT Fname, Lname, Address 
FROM   department
JOIN   employee 
       ON employee.dno = department.dnumber
WHERE  department.Dname = 'Research';

which is equivalent to query two and can be a clearer form (see https://stackoverflow.com/questions/1599050/ansi-vs-non-ansi-sql-join-syntax for an answer that describes why the newer syntax is preferred). Most query optimisers will see this equivalence and perform the same operations for such a simple query, though for more complex queries this may not be the case (if there is a difference the JOIN variant is likely to be the better of the two).

Not that this is almost equivalent to the inner part of your first query, join syntax you have used is the older but equivalent one. As you are doing that in the inner query then just adding an extra filter in the outer one, a good query planner will see these queries as identical and run them exactly the same way. A bad query planner will have the engine run the inner query first then apply the extra filtering clause making it much less efficient depending on what indexes are present for it to use.

But this all depends on a few things that you should edit your question and tags to provide:

  1. What database you are using as different query planners may not optimise the same way for a given query (add the database and version to the tags).
    Also most database engines provide a way to read what the query planner is likely to do for a given query which case be useful in judging which option is most optimal - I'll not list how to do that in every database I know though, be specific about the database(s) you are working with and we can be more specific & relevant about the answers we give.
  2. The indexes present on the relevant tables (add the index and key definitions to the question text)

Note that if you run the second query and there are two rows in department where name='Research' then you are likely to get an error as the = operator can only work on one value each side. To list people in all departments with that name use IN instead. This may change the answer to which is most efficient.