How to Set a Value if Columns Are the Same Across Multiple Rows in PostgreSQL

postgresqlupdatewindow functions

I am doing a data migration and I end up with a temporary table as follows:

curid cuid  rtid    cd          dd      rm
10     4    4   2016-01-02  2016-07-02  
16     4    4   2016-06-12  2016-12-12  Remarks Jun 12
18     5    3   2016-07-18  2017-07-31  
8      5    3   2015-06-21  2016-06-30  Add some test
11     6    4   2017-01-01  2017-07-01  
9      7    3   2017-01-01  2018-01-31

I need to split the data into two tables.
Based on that same data, it should look like this:

Table A
    id  curid cuid  rtid
    1    10    4    4   
    2    18    5    3   
    3    11    6    4   
    4     9    7    3

That's one row per distinct (cuid, rtid) plus a curid value picked from each set of duplicates. id is just a sequential number.

Table B
    id curid    cd           dd         rm
    1   10     2016-01-02   2016-07-02  
    2   10     2016-06-12   2016-12-12  Remarks Jun 12
    3   18     2016-07-18   2017-07-31  
    4   18     2015-06-21   2016-06-30  Add some test
    5   11     2017-01-01   2017-07-01  
    6    9     2017-01-01   2018-01-31

The actual curid is irrelevant as long as the records in Table B match the associated record in Table A (so we could even use a temp sequence or something to set the curid).

Best Answer

Your test setup

(Best provided this way in your question - hint!)

CREATE TEMP TABLE tmp (
  curid int
, cuid  int
, rtid  int
, cd    date
, dd    date
, rm    text);

INSERT INTO tmp VALUES
  (10, 4, 4, '2016-01-02', '2016-07-02', NULL)  
 ,(16, 4, 4, '2016-06-12', '2016-12-12', 'Remarks Jun 12')
 ,(18, 5, 3, '2016-07-18', '2017-07-31', NULL)  
 ,(8 , 5, 3, '2015-06-21', '2016-06-30', 'Add some test')
 ,(11, 6, 4, '2017-01-01', '2017-07-01', NULL)
 ,(9 , 7, 3, '2017-01-01', '2018-01-31', NULL);

Solution

Create target tables if they don't exist:

CREATE TEMP TABLE a (
   id    serial
 , curid int  -- UNIQUE?
 , cuid  int
 , rtid  int
);

CREATE TEMP TABLE b (
   id    serial
 , curid int
 , cd    date
 , dd    date
 , rm    text
);

Use DISTINCT ON for table A:

INSERT INTO a (curid, cuid, rtid)
SELECT DISTINCT ON (cuid, rtid)
       curid, cuid, rtid
FROM   tmp
ORDER  BY  cuid, rtid, curid  -- pick smallest curid per group
RETURNING *;

id | curid | cuid | rtid
-: | ----: | ---: | ---:
 1 |    10 |    4 |    4
 2 |     8 |    5 |    3
 3 |    11 |    6 |    4
 4 |     9 |    7 |    3

Detailed explanation here:

Select first row in each GROUP BY group?

Use a simple window function for table B:

INSERT INTO b (curid, cd, dd, rm)
SELECT min(curid) OVER (PARTITION BY cuid, rtid), cd, dd, rm
FROM   tmp
ORDER  BY cuid, rtid  -- optional
RETURNING *;

id | curid | cd         | dd         | rm            
-: | ----: | :--------- | :--------- | :-------------
 1 |    10 | 2016-01-02 | 2016-07-02 | null          
 2 |    10 | 2016-06-12 | 2016-12-12 | Remarks Jun 12
 3 |     8 | 2016-07-18 | 2017-07-31 | null          
 4 |     8 | 2015-06-21 | 2016-06-30 | Add some test 
 5 |    11 | 2017-01-01 | 2017-07-01 | null          
 6 |     9 | 2017-01-01 | 2018-01-31 | null

curid is guaranteed to match since we picked the smallest per group in both queries.

dbfiddle here

Related Solutions

Mysql – how to set a row’s value from a certain row’s value

Assuming that the tstamp has a UNIQUE constraint:

UPDATE activities AS a
  JOIN
  ( SELECT cur.tstamp,
           SUM(prev.amount) AS balance 
    FROM activities AS cur
      JOIN activities AS prev
        ON prev.tstamp <= cur.tstamp
    GROUP BY cur.tstamp
  ) AS p
  ON p.tstamp = a.tstamp
SET a.balance = p.balance ;

Tested: SQL-Fiddle

MySQL has also a feature to use ORDER BY with an UPDATE, which you can combine with the use of variables:

SET @b := 0 ;
UPDATE activities
SET balance = (@b := amount + @b)
ORDER BY tstamp ;

Tested: SQL-Fiddle

Postgresql – Calculate value differences of rows before and after bounded by multiple referenced rows

SELECT *
      ,CASE WHEN location = 'Loc_B' OR right_tree = 0 AND left_tree = 0 THEN NULL::int
            ELSE CASE WHEN @right_diff < @left_diff THEN right_diff ELSE left_diff END
       END AS min_diff
FROM  (
   SELECT *
         ,CASE WHEN right_tree > 0 THEN start_time - right_end ELSE 1000 END AS right_diff
         ,CASE WHEN left_tree  > 0 THEN end_time  - left_start ELSE 1000 END AS left_diff
   FROM  (
      SELECT *
            ,first_value(end_time)   OVER (PARTITION BY right_tree ORDER BY start_time) AS right_end
            ,first_value(start_time) OVER (PARTITION BY left_tree  ORDER BY start_time DESC) AS left_start
      FROM  (
         SELECT *
               ,count(location = 'Loc_B' OR NULL) OVER (PARTITION BY id ORDER BY start_time) AS right_tree
               ,count(location = 'Loc_B' OR NULL) OVER (PARTITION BY id ORDER BY start_time DESC) AS left_tree
         FROM   travel
         ) a
      ) b
   ) c
   ORDER  BY id, start_time;

Produces your result exactly.

1000 is just some "higher value than any other". Since your actual problem seems to operate with times and intervals infinity would be the perfect choice.

Related answer with detailed explanation how groups (right_tree and left_tree here) are formed:
Select longest continuous sequence

SQL Fiddle.