I have two tables
table_a_mod (2 million rows)
table_a (2 million rows + new live data)
table_a_mod
is identical to table table_a
except for two extra columns
table_a
is a "live" table with data being written to it semi-regularly.
To keep table_a_mod
up to date I would like to copy data from table_a
to table_a_mod
By using a regular join, I am able to select the data:
SELECT *
FROM table_a t1
LEFT JOIN table_a_mod t2
USING(id)
WHERE t2.id IS NULL
However, this query takes about 1 minute to run.
Is there a more efficient query along the lines of:
SELECT *
FROM table_a t1
LEFT JOIN
(
SELECT id
FROM table_a_mod
ORDER BY id DESC
LIMIT 1
) t2
USING (id)
WHERE t1.id > t2.id
ie. Get the largest id from table t2. Select all rows from table t1 with id greater than that number.
I've tried different permutations of the above query but I'm falling short.
Edit 1
In response to @danblack
EXPLAIN {slow_query}
"1" "SIMPLE" "table_a" "ALL" \N \N \N \N "2353123" ""
"1" "SIMPLE" "table_a_mod" "eq_ref" "PRIMARY" "PRIMARY" "4" "db.table_a.id" "1" "Using where; Not exists"
I'm assuming id is a primary key in both
Yes. id is a primary key in both.
Are the tables innodb and is there sufficient innodb_buffer_pool_size to hold both tables?
Both tables are innodb. And innodb_buffer_pool_size is 8388608
Is "new live data" always of an id above the previous table?
Yes, because it's auto incrementing
Why are you keeping two nearly identical tables
It is a temporary situation. We will drop the older one and use the newer one.
Best Answer
How much RAM do you have? If you have more than 4GB, set innodb_buffer_pool_size to 70% of available RAM. This is the most important thing to tune. With it set to only 8M, you are definitely I/O-bound. Increasing the size will probably shrink the time to under 10 seconds.
Your second idea should be good (assuming the only new rows have higher ids), but let's get rid of the
LEFT
and useMAX
: