Do cross-joins get optimised by the client library

Note This is deliberately database agnostic, I'm interested in how different implementations, er, differ.

I came across Joins are for lazy people on SO, and it's made me question an assumption I'd made.

Joining in the app server can be more
efficient if joining on the database
causes severe redundancy in the result
set sent over the network. Consider
tables A and B, where each row in A is
associated with 20 rows in B, B has
only 100 rows, and we want to fetch
the first 1000 rows from A with
associated rows from B. Joining in the
database will result in 20 * 1000
tuples sent across the network. If the
join is done in the app server (first
fetching the entire B table into
memory), a mere 100 + 1000 rows are
sent across the network.

I kinda assumed that a cross-join would be optimised in the client library, so that less data was sent over the network, and then merged, so that the app didn't need to glue the data together. Am I plain wrong, or does it vary depending on the database?

SQL> ed Wrote file afiedt.buf 1 select dname, 2 cursor( select ename, empno 3 from emp e 4 where e.deptno=d.deptno) emps 5* from dept d SQL> / DNAME EMPS -------------- -------------------- ACCOUNTING CURSOR STATEMENT : 2 CURSOR STATEMENT : 2 ENAME EMPNO ---------- ---------- CLARK 7782 KING 7839 MILLER 7934 RESEARCH CURSOR STATEMENT : 2 CURSOR STATEMENT : 2 ENAME EMPNO ---------- ---------- smith 7369 JONES 7566 SCOTT 7788 ADAMS 7876 FORD 7902 SALES CURSOR STATEMENT : 2 CURSOR STATEMENT : 2 ENAME EMPNO ---------- ---------- ALLEN 7499 WARD 7521 MARTIN 7654 BLAKE 7698 TURNER 7844 SM0 7900 6 rows selected. OPERATIONS CURSOR STATEMENT : 2 CURSOR STATEMENT : 2 no rows selected

Best Answer

I'm reasonably confident that there is no optimization in the Oracle client libraries for repeating data. If it is, it's below the SQL*Net trace layer which would surprise me.

There are definitely cases where it could make sense to do the join in the app server rather than in the database. But it would be exceedingly rare that you'd have a table structure like this where you would also want to display the result of joining A to B to the user. If you don't want to display the data to the user, you'd just make sure that the results were being processed in the database so that neither A nor B had to be sent over the network.

I have had one or two cases where I needed to get highly denormalized results to a client where it made sense to use the CURSOR function rather than doing a straight join solely in an effort to minimize the amount of data being sent over the network. Of course, that requires that you're selecting a substantial amount of data from table A (DEPT in this case) so that the duplication of data actually becomes problematic.

Best Answer

Related Solutions

Mysql – Get the rows where id_user_rq exists in id_user_ap and viceversa

Sql-server – How to get the information of the optimizer of sql server

Related Question