How to Set a Value if Columns Are the Same Across Multiple Rows in PostgreSQL

postgresqlupdatewindow functions

I am doing a data migration and I end up with a temporary table as follows:

curid cuid  rtid    cd          dd      rm
10     4    4   2016-01-02  2016-07-02  
16     4    4   2016-06-12  2016-12-12  Remarks Jun 12
18     5    3   2016-07-18  2017-07-31  
8      5    3   2015-06-21  2016-06-30  Add some test
11     6    4   2017-01-01  2017-07-01  
9      7    3   2017-01-01  2018-01-31  

I need to split the data into two tables.
Based on that same data, it should look like this:

Table A
    id  curid cuid  rtid
    1    10    4    4   
    2    18    5    3   
    3    11    6    4   
    4     9    7    3

That's one row per distinct (cuid, rtid) plus a curid value picked from each set of duplicates. id is just a sequential number.

Table B
    id curid    cd           dd         rm
    1   10     2016-01-02   2016-07-02  
    2   10     2016-06-12   2016-12-12  Remarks Jun 12
    3   18     2016-07-18   2017-07-31  
    4   18     2015-06-21   2016-06-30  Add some test
    5   11     2017-01-01   2017-07-01  
    6    9     2017-01-01   2018-01-31

The actual curid is irrelevant as long as the records in Table B match the associated record in Table A (so we could even use a temp sequence or something to set the curid).

Best Answer

Your test setup

(Best provided this way in your question - hint!)

CREATE TEMP TABLE tmp (
  curid int
, cuid  int
, rtid  int
, cd    date
, dd    date
, rm    text);

INSERT INTO tmp VALUES
  (10, 4, 4, '2016-01-02', '2016-07-02', NULL)  
 ,(16, 4, 4, '2016-06-12', '2016-12-12', 'Remarks Jun 12')
 ,(18, 5, 3, '2016-07-18', '2017-07-31', NULL)  
 ,(8 , 5, 3, '2015-06-21', '2016-06-30', 'Add some test')
 ,(11, 6, 4, '2017-01-01', '2017-07-01', NULL)
 ,(9 , 7, 3, '2017-01-01', '2018-01-31', NULL);

Solution

Create target tables if they don't exist:

CREATE TEMP TABLE a (
   id    serial
 , curid int  -- UNIQUE?
 , cuid  int
 , rtid  int
);

CREATE TEMP TABLE b (
   id    serial
 , curid int
 , cd    date
 , dd    date
 , rm    text
);

Use DISTINCT ON for table A:

INSERT INTO a (curid, cuid, rtid)
SELECT DISTINCT ON (cuid, rtid)
       curid, cuid, rtid
FROM   tmp
ORDER  BY  cuid, rtid, curid  -- pick smallest curid per group
RETURNING *; 
id | curid | cuid | rtid
-: | ----: | ---: | ---:
 1 |    10 |    4 |    4
 2 |     8 |    5 |    3
 3 |    11 |    6 |    4
 4 |     9 |    7 |    3

Detailed explanation here:

Use a simple window function for table B:

INSERT INTO b (curid, cd, dd, rm)
SELECT min(curid) OVER (PARTITION BY cuid, rtid), cd, dd, rm
FROM   tmp
ORDER  BY cuid, rtid  -- optional
RETURNING *; 
id | curid | cd         | dd         | rm            
-: | ----: | :--------- | :--------- | :-------------
 1 |    10 | 2016-01-02 | 2016-07-02 | null          
 2 |    10 | 2016-06-12 | 2016-12-12 | Remarks Jun 12
 3 |     8 | 2016-07-18 | 2017-07-31 | null          
 4 |     8 | 2015-06-21 | 2016-06-30 | Add some test 
 5 |    11 | 2017-01-01 | 2017-07-01 | null          
 6 |     9 | 2017-01-01 | 2018-01-31 | null          

curid is guaranteed to match since we picked the smallest per group in both queries.

dbfiddle here