Mysql – Removing the uses of MySQL temporary tables

MySQLPHPreplicationtemporary-tables

We have a very huge PHP/MySQL application with a lot of legacy code, this application utilized MySQL for most of it's work. It does most of the processing on the database side using temporary tables, or tables that are used temporarily but are not strictly declared as TEMPORARY.

Also this application uses MySQL replication with 3 replicas for each instance, and we use write on those replicas.

In our current setup we use replication filtering to ignore those tables ,We are moving all of our application instances, which is about 15 deployments, to Amazon's RDS, and Amazon RDS does not allow setting "replicat-ignore-tables" param

So now we want to remove the usage of temporary tables, and we are checking the alternatives.

We mainly use them to process joined data from huge tables (billions of rows) and some matrix processing utilizing the use of indexes to process matrix rows.

What would be the best way to replace temporary tables usages using PHP and MySQL?

EDIT
The answer below would solve our first usage, which is direct selects.

Another usage goes as this:

  1. We SELECT a huge amount of data from a huge table into a file using locally located.
  2. We create a TEMPORARY table and load the data in the file into it.
  3. Then we do some calculations and update some rows on that table in every calculation, for example we calculate the height of a building, then we update all rows with that height in the temporary table.
  4. We then run some aggregation calculations on that table, and put the results in a final table.
  5. We drop that table.

Best Answer

Sometimes i use temporary tables to create complex reports. A relatively simple (and meaningless) example could be something like

drop temporary table if exists some_temporary_table1;
create temporary table some_temporary_table1
    select t1.someColumn, max(t2.someOtherColumn) as someMax
    from some_table1 t1
    join some_table2 t2 using(someJoinColumn)
    where t1.someConditionColumn = 'someConditionValue'
    group by t1.someColumn
;

drop temporary table if exists some_temporary_table2;
create temporary table some_temporary_table2
    select tmp1.someColumn, tmp1.someMax, min(t3.someCompletelyOtherColumn) as someMin
    from some_temporary_table1 tmp1
    join some_table3 t3 using(someJoinColumn)
    where t3.someOtherConditionColumn = 'someOtherConditionValue'
    group by tmp1.someColumn, tmp1.someMax
;

select *
from some_temporary_table2 tmp2
join some_table3 t3 using(someDifferentJoinColumn)
where t3.someFancyConditionColumn = 'someFancyConditionValue'

Same result can be achieved without temporary tables, eliminating them step by step.

Step 1 - eliminate some_temporary_table1, replacing its occurrence with its creating code:

drop temporary table if exists some_temporary_table2;
create temporary table some_temporary_table2
    select tmp1.someColumn, tmp1.someMax, min(t3.someCompletelyOtherColumn) as someMin
    from (
        select t1.someColumn, max(t2.someOtherColumn) as someMax
        from some_table1 t1
        join some_table2 t2 using(someJoinColumn)
        where t1.someConditionColumn = 'someConditionValue'
        group by t1.someColumn
    ) tmp1
    join some_table3 t3 using(someJoinColumn)
    where t3.someOtherConditionColumn = 'someOtherConditionValue'
    group by tmp1.someColumn, tmp1.someMax

Step 2 - eliminate some_temporary_table2

select *
from (
    select tmp1.someColumn, tmp1.someMax, min(t3.someCompletelyOtherColumn) as someMin
    from (
        select t1.someColumn, max(t2.someOtherColumn) as someMax
        from some_table1 t1
        join some_table2 t2 using(someJoinColumn)
        where t1.someConditionColumn = 'someConditionValue'
        group by t1.someColumn
    ) tmp1
    join some_table3 t3 using(someJoinColumn)
    where t3.someOtherConditionColumn = 'someOtherConditionValue'
    group by tmp1.someColumn, tmp1.someMax
) tmp2
join some_table3 t3 using(someDifferentJoinColumn)
where t3.someFancyConditionColumn = 'someFancyConditionValue'

If more temporary tables are involved, this can end up in a highly unreadable (and unmaintainable) code. To keep your code readable you can build the query in PHP using string variables for subqueries:

$subQuery1 = "
    select t1.someColumn, max(t2.someOtherColumn) as someMax
    from some_table1 t1
    join some_table2 t2 using(someJoinColumn)
    where t1.someConditionColumn = 'someConditionValue'
    group by t1.someColumn
";

$subQuery2 = "
    select tmp1.someColumn, tmp1.someMax, min(t3.someCompletelyOtherColumn) as someMin
    from ( {$subQuery1} ) tmp1
    join some_table3 t3 using(someJoinColumn)
    where t3.someOtherConditionColumn = 'someOtherConditionValue'
    group by tmp1.someColumn, tmp1.someMax
";

$mainQuery = "
    select *
    from ( {$subQuery2} ) tmp2
    join some_table3 t3 using(someDifferentJoinColumn)
    where t3.someFancyConditionColumn = 'someFancyConditionValue'
";

which is similar to the original temporary table solution.

In terms of performance - it usually doesn't make much difference. Sometimes the temporary-tables-solution is faster. Sometimes it's the one-statement-solution. But mostly it's not more than 10%. In an old example i just tested, the one-statement-solution was a bit was faster, even though i gave some hints like to use smaller data types (smallint instead of bigint), creating indexes on temp tables to use for the next join and using a temp table twice. In another example i remember from work, the temporary-tables-solution has been faster, even though i didn't use any hints.