Mysql – Which query is optimized

mysql-5.7optimization

I've written these two queries for same operation, on the basis of time complexity, i want to know which one is efficient.

select Fname, Lname, Address 
from (select * from department d, employee e where d.Dnumber = e.Dno) as a 
where a.Dname = 'Research';

Edit: on the basis of my first comment i assume that the query in from clause will work as temporary table/view which i name as "a", and contain all the columns of both the tables and i use it.(and i don't know it's a efficient way to do it.)

select Fname, Lname, Address from employee 
where Dno = (select Dnumber from department where Dname = 'Research');

Or there could be more efficient way to do it.
Thanks.

Edit2:

SELECT Pnumber, Dnum, Lname, Address, Bdate 
FROM employee 
JOIN department d ON d.Mgr_ssn = employee.Ssn 
JOIN project p ON p.Dnum = d.Dnumber 
WHERE p.Plocation = 'Stafford';

i've this argument, can you tell me what's the flaw in this.,

let's there be 1000 rows in each PROJECT, DEPARTMENT, EMPLOYEE table,

then in the above query, compiler will choose in what order the join has to be applied (ABC, ACB,BCA, BAC,…) choosing the efficient one. But to select the best of all for the first, then second,… it has to perform all and compare it with each other, plus after that the order of the tuples involved would be same(not sure, maybe if it goes for EMPLOYEE, DEPARTMENT first it only have 100-150 rows, that entirely depends upon tables…right???). I never worked on live project situation, maybe there are advantages of JOINS(which there are, that why it's in advance release) which i'm not able to see. But i'm not convinced.

then in the below query let 250 comes out to be success, which then CARTESIAN PRODUCT with DEPARTMENT table and produce 2,50,000 tuples from which let's say 5,000 be RESULT, which then again CARTESIAN PRODUCT with 1000 tuples to produce 500,000.

SELECT Pnumber, Dnum, Lname, Address, Bdate 
FROM (SELECT Pnumber, Dnum, Mgr_ssn 
      FROM department d, (SELECT Pnumber, Dnum 
                          FROM project where Plocation = 'Stafford') p
      WHERE d.Dnumber = p.Dnum) q, employee e  
WHERE q.Mgr_ssn = e.Ssn;

So, my question is JOIN is to write query simply, and let compiler decides what's the efficient order. Whereas in the below one we've done the almost work for compiler.

and, i've one more question, WHERE clause applies only on one relation?, or let me rephrase, first FROM(along with all the JOINS) clause execute then where clause execute coz in that case the query with JOINS will be very expensive.

Thanks.

Best Answer

You could also write

SELECT Fname, Lname, Address 
FROM   department
JOIN   employee 
       ON employee.dno = department.dnumber
WHERE  department.Dname = 'Research';

which is equivalent to query two and can be a clearer form (see https://stackoverflow.com/questions/1599050/ansi-vs-non-ansi-sql-join-syntax for an answer that describes why the newer syntax is preferred). Most query optimisers will see this equivalence and perform the same operations for such a simple query, though for more complex queries this may not be the case (if there is a difference the JOIN variant is likely to be the better of the two).

Not that this is almost equivalent to the inner part of your first query, join syntax you have used is the older but equivalent one. As you are doing that in the inner query then just adding an extra filter in the outer one, a good query planner will see these queries as identical and run them exactly the same way. A bad query planner will have the engine run the inner query first then apply the extra filtering clause making it much less efficient depending on what indexes are present for it to use.

But this all depends on a few things that you should edit your question and tags to provide:

What database you are using as different query planners may not optimise the same way for a given query (add the database and version to the tags).
Also most database engines provide a way to read what the query planner is likely to do for a given query which case be useful in judging which option is most optimal - I'll not list how to do that in every database I know though, be specific about the database(s) you are working with and we can be more specific & relevant about the answers we give.
The indexes present on the relevant tables (add the index and key definitions to the question text)

Note that if you run the second query and there are two rows in department where name='Research' then you are likely to get an error as the = operator can only work on one value each side. To list people in all departments with that name use IN instead. This may change the answer to which is most efficient.

Related Solutions

Do cross-joins get optimised by the client library

I'm reasonably confident that there is no optimization in the Oracle client libraries for repeating data. If it is, it's below the SQL*Net trace layer which would surprise me.

There are definitely cases where it could make sense to do the join in the app server rather than in the database. But it would be exceedingly rare that you'd have a table structure like this where you would also want to display the result of joining A to B to the user. If you don't want to display the data to the user, you'd just make sure that the results were being processed in the database so that neither A nor B had to be sent over the network.

I have had one or two cases where I needed to get highly denormalized results to a client where it made sense to use the CURSOR function rather than doing a straight join solely in an effort to minimize the amount of data being sent over the network. Of course, that requires that you're selecting a substantial amount of data from table A (DEPT in this case) so that the duplication of data actually becomes problematic.

SQL> ed
Wrote file afiedt.buf

  1  select dname,
  2         cursor( select ename, empno
  3                   from emp e
  4                  where e.deptno=d.deptno) emps
  5*   from dept d
SQL> /

DNAME          EMPS
-------------- --------------------
ACCOUNTING     CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

ENAME           EMPNO
---------- ----------
CLARK            7782
KING             7839
MILLER           7934

RESEARCH       CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

ENAME           EMPNO
---------- ----------
smith            7369
JONES            7566
SCOTT            7788
ADAMS            7876
FORD             7902

SALES          CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

ENAME           EMPNO
---------- ----------
ALLEN            7499
WARD             7521
MARTIN           7654
BLAKE            7698
TURNER           7844
SM0              7900

6 rows selected.

OPERATIONS     CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

no rows selected

Thesql query builder inefficiently builds a query

Short answer: You need to rewrite the query. Or maybe upgrade.

Long answer:

MySQL has long been dumb when it comes to IN ( SELECT ... ).

By using a derived table, the optimizer will do the subquery first, thereby being a lot more efficient:

SELECT ...
    FROM ( SELECT DISTINCT track_id
               FROM ...
               WHERE ... ) x
    JOIN ...
        ON x.track_id = ...

Or... Upgrade to 5.7 (or 8.0) with its "query rewrite" so that you can rewrite the query after it is sent from whatever is building it.

Best Answer

Related Solutions

Do cross-joins get optimised by the client library

Thesql query builder inefficiently builds a query

Related Question