PostgreSQL – How to Improve Long Running Query Performance

migrationperformancepostgresqlquery-performance

I have this query

INSERT INTO interactions(
    user_id, external_object_id, origin, interaction_type, 
    interaction_sub_type, published_at, origin_id, created_at, updated_at)

SELECT find_user(id), find_external_object(external_object_id), 'facebook' as origin,
    interaction_type, interaction_sub_type, published_at, origin_id, 
    current_date, current_date
from legacy_table

The legacy_table have around 9 million records, and it is running for around 20 hours now and find_user and find_external_object are functions that look for the return the id of an user and an external object (of course) based on id from legacy table, like this

SELECT legacy_user.uid into uid 
    from legacy_user 
    where legacy_user.id = $1;

SELECT users.id into user_id 
    from users 
    where users.uid = uid;

RETURN user_id;

What can I do to improve the query?

Best Answer

Try creating a couple of temporary tables, indexing them, analyzing them, and then join to them..

CREATE TEMPORARY TABLE luid_lookup AS
  SELECT id, find_user(id) AS fuid FROM legacy_table;

CREATE TEMPORARY TABLE eoid_lookup AS
  SELECT external_object_id, find_external_object(external_object_id) AS feid FROM legacy_table;

CREATE INDEX luid_lookup_tmp_ix1 ON luid_lookup (id, fuid);
CREATE INDEX eoid_lookup_tmp_ix1 ON eoid_lookup (external_object_id, feid);

ANALYZE luid_lookup;
ANALYZE eoid_lookup;

And then

INSERT INTO interactions(
    user_id, external_object_id, origin, interaction_type, 
    interaction_sub_type, published_at, origin_id, created_at, updated_at)

SELECT b.fuid, c.feid, 'facebook' as origin,
    a.interaction_type, a.interaction_sub_type, a.published_at, a.origin_id, 
    a.current_date, a.current_date
from legacy_table a
join luid_lookup b on a.id = b.id
join eoid_lookup c on a.external_object_id = c.external_object_id

If the creation of the temporary tables takes a long time, then your find_user() and your find_external_object() methods need to be tuned (as opposed to the actual insert query).

One additional note .. if the id or external_object_id are NOT pkeys/unique of their tables, you will want to make sure that you're only getting unique values.. (If you don't, then you'll get duplicate rows when you join) In that case, make sure you do a subselect and get the distinct result (example below).

CREATE TEMPORARY TABLE luid_lookup AS
  SELECT id, find_user(id) AS fuid FROM (SELECT DISTINCT id FROM legacy_table) l_tab;