Postgresql – Merge two rows letting the second one override the first in PostgreSQL

postgresql

Say that you want to build an append only table in Postgres, where each modification is actually just a new row added to the table. Say that you have the below simple, and disregard all questions about performance, table of users.

create table user_revisions (id uuid, name text, password text, version bigserial);
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'nthusaeo');
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'åäö');

Getting the latest version of a user would look something like:

select * from user_revisions e1
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9'
order by version desc limit 1;

But then you come to the part where you want to start modifying the users. Each request to the database always has the uuid of the user being modified and some subset of all the fields that are otherwise present in the user_revisions table.
You want to make an insert with the last revision of the user as a set of default values? That is, do the equivalent of the Clojure code (merge a b) or Python {**a, **b}, where the two associative data structures a and b are merged and if there are any keys in a that also exist in b the values held in b will override those of a.

Below are some failed attempts at writing a select that produces the output that we in turn wants to insert as the latest revision.

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural left join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural inner join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural right join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

The desired output would be:

 password  |                  id                  |  name  
-----------+--------------------------------------+--------                  
 snthsnth  | 743ccdf0-e9d8-4268-b6d7-0645eb70feb9 | Bengan

This output would then be inserted into the table as the latest version of the user with the id 743ccdf0-e9d8-4268-b6d7-0645eb70feb9.

Best Answer

It depends if you need something generic or if it can work with a "small" number of columns.

In your specific example, this query:

WITH new_data (id, name, password) AS (values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9'::uuid, NULL, 'snthsnth'))
SELECT 
id,
COALESCE(new_data.name, current_data.name) AS name,
COALESCE(new_data.password, current_data.password) AS password, 
current_data.version
FROM user_revisions current_data
INNER JOIN new_data USING (id)
WHERE id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9'
ORDER BY current_data.version DESC LIMIT 1;

would produce the expected:

id                                      name    password    version
743ccdf0-e9d8-4268-b6d7-0645eb70feb9    Bengan  snthsnth    2

You can test things in this SQLFiddle: http://sqlfiddle.com/#!17/c4700/5/0

I am quite confident it could be made generic if needed, but it may be enough already for your needs?

However it will not work correctly if you need to store NULL values, and if you need to handle transition such as "some data value => NULL"

In such cases there would be a specific need to realy test the existence of each field and just take its value if present, including if it is NULL

In passing I would also say like @Lennart that it is better to separate archive data from live data for many reasons, one would be performance, to compute the "next" version you will only have to read one row (the live one) and then doing something as above, instead of having to do things like order by version desc limit 1 each time, which will be more costly, even with an index on version.