Postgresql – Most efficient way of updating table from another – Postgres

postgresqlpostgresql-9.6

I need to populate a foreign key column based on the contents of another table, and both methods I've tried are prohibitively slow.

I have the below tables (some columns omitted):

              Table "public.interface_events"
           Column           |  Type   |     Modifiers
----------------------------+---------+--------------------
 device_id                  | integer | not null
 interface_id               | integer |

Foreign-key constraints:
    "interface_events_device_id_fkey" FOREIGN KEY (device_id) REFERENCES device(id) ON UPDATE CASCADE ON DELETE RESTRICT
    "interface_events_interface_id_fkey" FOREIGN KEY (interface_id) REFERENCES interface(id) ON UPDATE CASCADE ON DELETE RESTRICT

                                       Table "public.device"
        Column        |         Type          |                      Modifiers
----------------------+-----------------------+-----------------------------------------------------
 id                   | integer               | not null default nextval('device_id_seq'::regclass)
 interface_id         | integer               | not null

Indexes:
    "device_pkey" PRIMARY KEY, btree (id)
    "device_interface_id_idx" btree (interface_id)

Foreign-key constraints:
    "device_interface_id_fkey" FOREIGN KEY (interface_id) REFERENCES interface(id) ON UPDATE CASCADE ON DELETE RESTRICT

Column interface_id in table interface_events needs updating from table device, so currently I have:

UPDATE interface_events SET interface_id = device.interface_id 
FROM device 
WHERE device.id = interface_events.device_id

I should point out that the relationship between device and interface_events is one to many, so there may be thousands of events per device. Is the UPDATE FROM method suitable in this situation? I'd earlier tried:

UPDATE interface_events SET interface_id = (SELECT device.interface_id FROM device WHERE device.id = interface_events.device_id);

But that took forever also.

There are 30 million rows in table interface_events, 32,000 in table device.

The update running currently (using the FROM syntax) has been active for 21 hours, although this is on a low spec test machine (2 x 3G cores, 3G RAM, spindle disk – it's disk I/O that's killing it, constant 99%).

It occures to me as I write I should have applied the foreign key constaint after the update, since I know it's integrity will be ok thanks to the column I'm copying from already being constrained, so I guess this question is more about the best query to use?

Best Answer

The fastest way will probably be to create a new table based on a join of the existing tables, create indexes and foreign keys constraints, and then drop the old "interface_events" and rename the new table into place. Views and foreign keys into this table can complicate this.

create interface_events_temp as 
    select interface_events.*, device.interface_id from interface_events 
        join device on (device.id = interface_events.device_id)

But the interface_events.* notation relies on interface_id not already existing in interface_events. Since it already exists, you would have to list out all columns in interface_events except for interface_id.