Postgresql – Merge two rows letting the second one override the first in PostgreSQL

postgresql

Say that you want to build an append only table in Postgres, where each modification is actually just a new row added to the table. Say that you have the below simple, and disregard all questions about performance, table of users.

create table user_revisions (id uuid, name text, password text, version bigserial);
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'nthusaeo');
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'åäö');

Getting the latest version of a user would look something like:

select * from user_revisions e1
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9'
order by version desc limit 1;

But then you come to the part where you want to start modifying the users. Each request to the database always has the uuid of the user being modified and some subset of all the fields that are otherwise present in the user_revisions table.
You want to make an insert with the last revision of the user as a set of default values? That is, do the equivalent of the Clojure code (merge a b) or Python {**a, **b}, where the two associative data structures a and b are merged and if there are any keys in a that also exist in b the values held in b will override those of a.

Below are some failed attempts at writing a select that produces the output that we in turn wants to insert as the latest revision.

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural left join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural inner join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural right join temp
where e1.id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9';

The desired output would be:

 password  |                  id                  |  name  
-----------+--------------------------------------+--------                  
 snthsnth  | 743ccdf0-e9d8-4268-b6d7-0645eb70feb9 | Bengan

This output would then be inserted into the table as the latest version of the user with the id 743ccdf0-e9d8-4268-b6d7-0645eb70feb9.

Best Answer

It depends if you need something generic or if it can work with a "small" number of columns.

In your specific example, this query:

WITH new_data (id, name, password) AS (values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9'::uuid, NULL, 'snthsnth'))
SELECT 
id,
COALESCE(new_data.name, current_data.name) AS name,
COALESCE(new_data.password, current_data.password) AS password, 
current_data.version
FROM user_revisions current_data
INNER JOIN new_data USING (id)
WHERE id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9'
ORDER BY current_data.version DESC LIMIT 1;

would produce the expected:

id                                      name    password    version
743ccdf0-e9d8-4268-b6d7-0645eb70feb9    Bengan  snthsnth    2

You can test things in this SQLFiddle: http://sqlfiddle.com/#!17/c4700/5/0

I am quite confident it could be made generic if needed, but it may be enough already for your needs?

However it will not work correctly if you need to store NULL values, and if you need to handle transition such as "some data value => NULL"

In such cases there would be a specific need to realy test the existence of each field and just take its value if present, including if it is NULL

In passing I would also say like @Lennart that it is better to separate archive data from live data for many reasons, one would be performance, to compute the "next" version you will only have to read one row (the live one) and then doing something as above, instead of having to do things like order by version desc limit 1 each time, which will be more costly, even with an index on version.

Related Solutions

Postgresql – How to determine if a column is defined as a serial data type instead of an integer based off the catalog

SERIAL and BIGSERIAL are kind of pseudo-types. As you noticed, they are really just INT and BIGINT internally.

What happens behind the scenes is that PostgreSQL creates a sequence and sets up a dependency on it to the table. You can search pg_class for the sequence name and how it relates to the table.

pg_class: http://www.postgresql.org/docs/9.2/static/catalog-pg-class.html

SQL Fiddle: http://sqlfiddle.com/#!12/dfcbd/6

Sequence Functions: http://www.postgresql.org/docs/9.2/static/functions-sequence.html

This StackOverflow post might be helpful: https://stackoverflow.com/questions/1493262/list-all-sequences-in-a-postgres-db-8-1-with-sql

UPDATE: You can also use pg_depend to figure out which sequences relate to the table /column you are interested in: http://www.postgresql.org/docs/9.2/static/catalog-pg-depend.html

Postgresql ignoring a substring in the middle of a string

One solution is to use regexps to remove that part of the filename. Assuming the separators are known

regexp_replace(filename, '^(.*?)[-_][rR][0-9]+(\.[^.]+)$', E'\\1\\2')

This would assume separators - or _, then r or R followed by at least one number, a dot and then something but never more dots. The part with r/R+numers would be removed.

The efficiency of this depends on the amount of data and how the searches are done. It could be used in an index to speed up searching. For example

CREATE INDEX filename_name_idx ON documents (regexp_replace(filename, '^(.*?)[-_][rR][0-9]+(\.[^,]+)', E'\\1\\2'));

Then doing a search

SELECT filename FROM documents
  WHERE regexp_replace(filename,
    '^(.*?)[-_][rR][0-9]+(\.[^,]+)', E'\\1\\2') = 'LLE-MET-AP-0000-PLA-COB.pdf';

would use index (if the query optimizer deems it faster).

Do note that the regexp part has to be exactly the same in the index and the search, otherwise the index cannot be used. You can also combine it with LOWER() if case insensitive searching is required.

The same regexp can be used to remove the revision identifier from the search string also, if there is a need to search for matching files for, e.g., LLE-MET-AP-0000-PLA-COB-R00.pdf

Making a view with this would make it a lot nicer to use and the actual regexp would be in the database and not in the application layer.

Best Answer

Related Solutions

Postgresql – How to determine if a column is defined as a serial data type instead of an integer based off the catalog

Postgresql ignoring a substring in the middle of a string

Related Question