I'm trying to batch-copy data in a specific order from one table to another in PostgreSQL 12-beta2. The table is not using sequences, but contains an composite unique Primary Key (user_id, object_id
).
In order to determine where to start for the next batch, I'd like to start off from the last inserted row (WHERE user_id >= last_user_id AND object_id > last_object_id
).
Starting off with this:
INSERT INTO dest_table
SELECT (user_id, object_id, object_type, colN, ...)
FROM source_table
ORDER BY user_id, colN, object_id -- this is indexed
LIMIT 1000 -- batch size
RETURNING user_id, object_id;
… returns a table of 1000 tuples. I'd like to obtain the last inserted tuple from it.
I've tried to do a SELECT around it, like this:
SELECT user_id, object_id FROM (
INSERT INTO dest_table
SELECT (user_id, object_id, object_type, colN, ...)
FROM source_table
ORDER BY user_id, colN, object_id -- this is indexed
LIMIT 1000 -- batch size
RETURNING user_id, object_id
)
ORDER BY user_id DESC, colN DESC, object_id DESC
LIMIT 1
RETURNING user_id, object_id;
But that returns a syntax error:
ERROR: syntax error at or near "INTO"
LINE 2: INSERT INTO dest_table
^
I've also attempted RETURNING ... INTO variable
as described here, but that fails too:
ERROR: syntax error at or near "INTO"
LINE 23: RETURNING user_id, object_id INTO my_variable;
^
Do I need to create a function for this (e.g. plpgsql) or am I missing something obvious in plain SQL that let me do this? That would be highly favorable.
Best Answer
It seems that you need only the maximum of the last inserted (user_id,object_id) as a couple. The first INSERT could be:
This form uses a row to refer to the
(user_id,object_id)
couple for convenience. Row elements are compared left-to-right as needed in your case to use this couple of values as a boundary. See Row Constructor Comparison in the documentation.With this solution, the subsequent INSERTs would inject the value returned by the last preceding insert. This avoids looking up the last inserted row in
dest_table
.In plain SQL, you would do a single INSERT in one step without any batching. Since you want to break it into batches, there has to be some loop driving these INSERTs with a stop condition, and this procedural logic goes beyond pure SQL.
If you don't feel like reinjecting the boundary through a variable, it could be persisted in a dedicated, single-row table.