Assume I have a table as follows named people
, where id
is a Primary Key:
+-----------+---------+---------+
| id | fname | lname |
| (integer) | (text) | (text) |
+===========+=========+=========+
| 1 | Daniel | Edwards |
| 2 | Fred | Holt |
| 3 | Henry | Smith |
+-----------+---------+---------+
I'm trying to write a row duplication query which is robust enough to account for schema changes to the table. Any time I add a column to the table, I don't want to have to go back and modify the duplication query.
I know I can do this, which will duplicate record id 2 and give the duplicated record a new id:
INSERT INTO people (fname, lname) SELECT fname, lname FROM people WHERE id = 2;
However if I add an age
column, I'll need to modify the query to also account for the age column.
Obviously I can't do the following, because it will also duplicate the primary key, resulting in a duplicate key value violates unique constraint
— And, I don't want them to share the same id anyway:
INSERT INTO people SELECT * FROM people WHERE id = 2
With that said, what would be a reasonable approach to solving this challenge? I would prefer to stay away from stored procedures, but I'm not 100% against them, I suppose …
Best Answer
Simple with
hstore
If you have the additional module
hstore
installed (instructions in link below), there is a surprisingly simple way to replace the value(s) of individual field(s) without knowing anything about other columns:Basic example: duplicate the row with
id = 2
but replace2
with3
:Details:
Assuming (since it's not defined in the question) that
people.id
is aserial
column with an attached sequence, you'll want the next value from the sequence. We can determine the sequence name withpg_get_serial_sequence()
. Details:Or you can just hard-code the sequence name if it's not going to change.
We would have this query:
Which works, but suffers from a weakness in the Postgres query planner: The expression is evaluated separately for every single column in the row, wasting sequence numbers and performance. To avoid this, move the expression into a subqery and decompose the row once only:
Probably fastest for a single (or few) row(s) at once.
json / jsonb
If you don't have
hstore
installed and can't install additional modules, you can do a similar trick withjson_populate_record()
orjsonb_populate_record()
, but that capability is undocumented and may be unreliable.Transient temporary table
Another simple solution would be to use a transient temporary like this:
I added
ON COMMIT DROP
to drop the table automatically at the end of the transaction. Consequently, I also wrapped the operation into a transaction of its own. Neither is strictly necessary.This offers a wide range of additional options - you can do anything with the row before inserting, but it's going to be a bit slower due to the overhead of creating and dropping a temp table.
This solution works for a single row or for any number of rows at once. Each row gets a new default value from the sequence automatically.
Using the short (SQL standard) notation
TABLE people
.Dynamic SQL
For many rows at once, dynamic SQL is going to be fastest. Concatenate the columns from the system table
pg_attribute
or from the information schema and execute it dynamically in aDO
statement or write a function for repeated use:Call:
Works for any table with an integer column named
id
. You could easily make the column name dynamic, too ...Maybe not your first choice since you wanted to
stay away from stored procedures
, but then again, it's not a "stored procedure" anyway ...Related:
Advanced solution
A
serial
column is a special case. If you want to fill more or all columns with their respective default values, it gets more sophisticated. Consider this related answer: