We have a very huge PHP/MySQL application with a lot of legacy code, this application utilized MySQL for most of it's work. It does most of the processing on the database side using temporary tables, or tables that are used temporarily but are not strictly declared as TEMPORARY.
Also this application uses MySQL replication with 3 replicas for each instance, and we use write on those replicas.
In our current setup we use replication filtering to ignore those tables ,We are moving all of our application instances, which is about 15 deployments, to Amazon's RDS, and Amazon RDS does not allow setting "replicat-ignore-tables" param
So now we want to remove the usage of temporary tables, and we are checking the alternatives.
We mainly use them to process joined data from huge tables (billions of rows) and some matrix processing utilizing the use of indexes to process matrix rows.
What would be the best way to replace temporary tables usages using PHP and MySQL?
EDIT
The answer below would solve our first usage, which is direct selects.
Another usage goes as this:
- We
SELECT
a huge amount of data from a huge table into a file using locally located. - We create a TEMPORARY table and load the data in the file into it.
- Then we do some calculations and update some rows on that table in every calculation, for example we calculate the height of a building, then we update all rows with that height in the temporary table.
- We then run some aggregation calculations on that table, and put the results in a final table.
- We drop that table.
Best Answer
Sometimes i use temporary tables to create complex reports. A relatively simple (and meaningless) example could be something like
Same result can be achieved without temporary tables, eliminating them step by step.
Step 1 - eliminate
some_temporary_table1
, replacing its occurrence with its creating code:Step 2 - eliminate
some_temporary_table2
If more temporary tables are involved, this can end up in a highly unreadable (and unmaintainable) code. To keep your code readable you can build the query in PHP using string variables for subqueries:
which is similar to the original temporary table solution.
In terms of performance - it usually doesn't make much difference. Sometimes the temporary-tables-solution is faster. Sometimes it's the one-statement-solution. But mostly it's not more than 10%. In an old example i just tested, the one-statement-solution was a bit was faster, even though i gave some hints like to use smaller data types (smallint instead of bigint), creating indexes on temp tables to use for the next join and using a temp table twice. In another example i remember from work, the temporary-tables-solution has been faster, even though i didn't use any hints.