Postgresql – How to denote separate records as “one and the same”

database-designpostgresql

Let's say there are two separate Person records

  • V. Baker
  • Vincent Baker

each with their own various associations. These could be things like photos, contact information, etc.

And let's say that it turns out these are the same person. What's the best way to figuratively "combine" them so that it's possible to access either record and also pull in the direct and associated content from the other record/s as well? I say "figuratively" because they'll need to remain separate so that each record's unique data will be preserved.

  • Create a one-to-many self-referencing relationship where one Person can have one or more People?

  • Create a separate column with a unique identifier? And when two or more records are deemed to be the same, simply give them all the same key?

  • Or some other technique I don't know about yet?

My concern, though, is that whatever solution I implement, there's a good chance it will need to be transitive. So if A is noted as being the same as B, if a new C comes along and is noted as being the same as B, this should also result in C being noted the same as the original A.

Thank you in advance for any help and insights!

Best Answer

Here is a suggestion users_equal_user

create table  user_equal_user (
ueu_user_id int primary key ,
ueu_link_user_id int);


Select * from users where user_id in
(select ueu_link_user_id from user_equal_user where ueu_user_id = pass_in_user_id);

This will provide a many to many relation your looking for and be unlimited

EDITED OP asked how to deal with A=B=C=D to get all the other cross linked ids so here is some untested code that should get you close

create or replace function get_associated_ids(puser_id int,  
                                       puser_id_matched int[] )
 RETURNS table(user_id int, user_id_linked int)
    LANGUAGE 'plpgsql'

    COST 100
    VOLATILE 
AS $BODY$
declare
    _r record;
    _found int;
    _matched int[]; --use to block already processed ids,  
                    --this is here encase  of a
                    --bad entry that creates a circular 
                    --reference that would create an endless 
                    --recursive loop

begin
        _matched = array_append(puser_id_matched, puser_id);
        for _r in Select ueu_user_id, ueu_link_user_id 
                    from user_equal_user 
                    where ueu_user_id = puser_id 
                        and ueu_user_id <> any(puser_id_matched)
                        and ueu_link_user_id <> any(puser_id_matched) ; 

            select ueu_user_id into _found  from user_equal_user 
                     where ueu_user_id = _r.ueu_user_id;
            if found then
                    _matched = array_append(_matched, _found) ;
                    return query ( select * from 
                            get_associated_ids(ueu_user_id, _matched ) );
            end if;
            user_id = _r.ueu_user_id;
            user_id_linked = _r.ueu_link_user_id;
            return next;
        end loop;
end;
$BODY$