PostgreSQL – How to Recursively Query Across Two Columns

Sorry my DBA skills are a bit limited, but here's an example schema (simplified from my real table)

create table test
(
    id          serial primary key,
    person      TEXT,
    start_point TEXT,
    end_point   TEXT
);

and here's some sample data

insert into test(person, start_point, end_point)
values ('Bob', 'a', 'b'),
       ('Bob', 'b', 'c'),
       ('Bob', 'c', 'd'),
       ('Alice', 'a', 'b'),
       ('Alice', 'b', 'c'),
       ('Fred', 'a', 'b'),
       ('Fred', 'c', 'd')
;

Notice how a particular person can go from a->b, then b->c etc.

The Question

How can I SELECT that traverses the start/end and gives me something like this as the output?

| person | place | group_id |
|--------|-------|----------|
| Bob    | a     | 1        |
| Bob    | b     | 1        |
| Bob    | c     | 1        |
| Bob    | d     | 1        |
| Alice  | a     | 2        |
| Alice  | b     | 2        |
| Alice  | c     | 2        |
| Fred   | a     | 3        |
| Fred   | b     | 3        |
| Fred   | c     | 4        |
| Fred   | d     | 4        |

Notice how Bob went thru a-b-c-d, Alice went thru a-b-c, but Fred didn't make it all the way thru so he gets split out into an a-b then a c-d, because he was missing the b-c. group_id can just be from a sequence.

What I have tried so far
What I'm currently doing, is selecting the entire dataset into a CURSOR, then doing a LOOP over it, checking if the start_point matches the end_point of the previous row, and the person matches, whilst this condition is true group them up and update a table.

When we don't find a match, assume it's for a new person and start counting again.

This is incredibly inefficient when the table grows, and I'm in a multithreaded application so I'd much rather be able to do an atomic UPDATE...SELECT rather than opening a cursor and looping through it.

Thanks

WITH RECURSIVE movement(id, person, place, end_point, level) as ( select id,person, start_point, end_point, 1 from test union select p.id, c.person, c.end_point, c.end_point , case when p.id=c.id then p.level else p.level + 1 end from test c inner join movement p on (c.start_point=p.end_point or c.start_point=p.place) and c.person=p.person ), movementSum as ( select id,person,place,level from movement m where m.id = (select min(id) from movement t where t.person=m.person and m.place = t.place) ) select * from movementSum ms order by id, place, level

Best Answer

It isn't exactly clear what the group_id represents but for the traversing of the points this can be handled with a recursive query like so:

Once you share more about group_id perhaps the exact sql can be produced. Use this sql fiddle to decide what group_id should represent: http://sqlfiddle.com/#!17/cc042/2

Best Answer

Related Solutions

Postgresql – Behavior of PostgreSQL COPY FROM

Postgresql – How to optimize an ordered date range query with several OR filters (for streaming)

Related Question