Postgresql – How to obtain the last inserted row with INSERT … SELECT in PostgreSQL

insertpostgresqlselect-into

I'm trying to batch-copy data in a specific order from one table to another in PostgreSQL 12-beta2. The table is not using sequences, but contains an composite unique Primary Key (user_id, object_id).

In order to determine where to start for the next batch, I'd like to start off from the last inserted row (WHERE user_id >= last_user_id AND object_id > last_object_id).

Starting off with this:

INSERT INTO dest_table
SELECT (user_id, object_id, object_type, colN, ...) 
FROM source_table 
ORDER BY user_id, colN, object_id  -- this is indexed
LIMIT 1000  -- batch size
RETURNING user_id, object_id;

… returns a table of 1000 tuples. I'd like to obtain the last inserted tuple from it.

I've tried to do a SELECT around it, like this:

SELECT user_id, object_id FROM (
    INSERT INTO dest_table
    SELECT (user_id, object_id, object_type, colN, ...) 
    FROM source_table 
    ORDER BY user_id, colN, object_id  -- this is indexed
    LIMIT 1000  -- batch size
    RETURNING user_id, object_id
)
ORDER BY user_id DESC, colN DESC, object_id DESC
LIMIT 1
RETURNING user_id, object_id;

But that returns a syntax error:

ERROR:  syntax error at or near "INTO"
LINE 2: INSERT INTO dest_table
               ^

I've also attempted RETURNING ... INTO variable as described here, but that fails too:

ERROR:  syntax error at or near "INTO"
LINE 23: RETURNING user_id, object_id INTO my_variable;
                                      ^

Do I need to create a function for this (e.g. plpgsql) or am I missing something obvious in plain SQL that let me do this? That would be highly favorable.

Best Answer

It seems that you need only the maximum of the last inserted (user_id,object_id) as a couple. The first INSERT could be:

WITH inserted_rows as (
INSERT INTO dest_table
SELECT user_id, object_id, object_type, colN, ...
FROM source_table 
ORDER BY user_id, colN, object_id  -- this is indexed
LIMIT 1000  -- batch size
RETURNING user_id, object_id
)
SELECT row(user_id,object_id) FROM inserted_rows
 ORDER BY 1 DESC LIMIT 1;

This form uses a row to refer to the (user_id,object_id) couple for convenience. Row elements are compared left-to-right as needed in your case to use this couple of values as a boundary. See Row Constructor Comparison in the documentation.

With this solution, the subsequent INSERTs would inject the value returned by the last preceding insert. This avoids looking up the last inserted row in dest_table.

WITH inserted_rows as (
INSERT INTO dest_table
SELECT user_id, object_id, object_type, colN, ...
FROM source_table
  WHERE row(user_id,object_id) > row(...inject the result of the previous query here...)
ORDER BY user_id, colN, object_id
LIMIT 1000
RETURNING user_id, object_id
)
SELECT row(user_id,object_id) FROM inserted_rows
ORDER BY 1 DESC LIMIT 1;

Do I need to create a function for this (e.g. plpgsql) or am I missing something obvious in plain SQL that let me do this? That would be highly favorable.

In plain SQL, you would do a single INSERT in one step without any batching. Since you want to break it into batches, there has to be some loop driving these INSERTs with a stop condition, and this procedural logic goes beyond pure SQL.

If you don't feel like reinjecting the boundary through a variable, it could be persisted in a dedicated, single-row table.

Related Solutions

Postgresql – How to insert values into a table from a select query in PostgreSQL

Column order does matter so if (and only if) the column orders match you can for example:

insert into items_ver
select * from items where item_id=2;

Or if they don't match you could for example:

insert into items_ver(item_id, item_group, name)
select * from items where item_id=2;

but relying on column order is a bug waiting to happen (it can change, as can the number of columns) - it also makes your SQL harder to read

There is no good 'shortcut' - you should explicitly list columns for both the table you are inserting into and the query you are using for the source data, eg:

insert into items_ver (item_id, name, item_group)
select item_id, name, item_group from items where item_id=2;

dbfiddle here

Sql-server – Combine – Select Into, with Insert Into Select

When the automation fires off the second time, I get an error because the table already exists and, consequently, the second round of data isn't inserted.

I would suggest you to put a condition to check the existence of the table

If exists (select 1 from sys.tables where name = '' and type = 'U')
begin -- table exists, so insert values in it
insert into table_name values ()
end
else
begin -- table does not exist, so create it
create table table_name (column datatype)
insert into table_name values ()
end

You can even use below sql to create new blank table as the where 1=0 will always be false.

Note that it wont create any indexes from old_table into new_table.

SELECT *
INTO new_table
FROM old_table where 1=0

To capture errors, you can use TRY/CATCH

Best Answer

Related Solutions

Postgresql – How to insert values into a table from a select query in PostgreSQL

Sql-server – Combine – Select Into, with Insert Into Select

Related Question