I have this query
INSERT INTO interactions(
user_id, external_object_id, origin, interaction_type,
interaction_sub_type, published_at, origin_id, created_at, updated_at)
SELECT find_user(id), find_external_object(external_object_id), 'facebook' as origin,
interaction_type, interaction_sub_type, published_at, origin_id,
current_date, current_date
from legacy_table
The legacy_table
have around 9 million records, and it is running for around 20 hours now and find_user
and find_external_object
are functions that look for the return the id of an user and an external object (of course) based on id from legacy table, like this
SELECT legacy_user.uid into uid
from legacy_user
where legacy_user.id = $1;
SELECT users.id into user_id
from users
where users.uid = uid;
RETURN user_id;
What can I do to improve the query?
Best Answer
Try creating a couple of temporary tables, indexing them, analyzing them, and then join to them..
And then
If the creation of the temporary tables takes a long time, then your find_user() and your find_external_object() methods need to be tuned (as opposed to the actual insert query).
One additional note .. if the
id
orexternal_object_id
are NOT pkeys/unique of their tables, you will want to make sure that you're only getting unique values.. (If you don't, then you'll get duplicate rows when you join) In that case, make sure you do a subselect and get the distinct result (example below).