Postgresql – Select Distinct on subset of columns, name different set of columns to return

postgresql

I am using PostgreSQL and have two tables:

Table_A
colA | colB | colC | colD | colE | colF | colG | colH | colI | colJ

Table_B
colA | colB | colC | columnD | colE | columnF | colG | colH | colI | colJ

I am trying to insert a number of rows from table_B into table_A. My problem is that Table_A has a primary key based on colA, colB, colC, colD, and colE. Table_B does not have this restriction, which means a simple insert won't work:

INSERT INTO Table_A (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ FROM Table_B;

I am trying to work around this by using DISTINCT in my selection from Table_B. However, I cannot determine the correct syntax to both select distinct on the 5 primary key columns used in Table_A, and select all ten columns to be inserted. I have tried

INSERT INTO Table_A (colA, colB, colC, colD, colE) 
SELECT DISTINCT colA, colB, colC, columnD, colE FROM Table_B;

Which correctly pulls unique entries but does not populate columns F-J, and I have tried

INSERT INTO Table_A (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT(colA, colB, colC, columnD, colE) colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ FROM Table_B;

But this fails, as the first column entry is a wrapped version of the 5 unique columns, and fails the insert due to column length restrictions – the SELECT DISTINCT in parens returns '' which obviously doesn't fit:

ERROR: Value is too long for type character varying(12)

My goal would be form a query which gets all the unique combinations of colA, colB, colC, columnD, colE from Table_B, and inserts those full rows, including columnF, colG, colH, colI, colJ into Table_A.

Best Answer

You're almost there. SELECT DISTINCT returns that group of columns, as you've found out. If you use DISTINCT ON instead though, you'll get the rows you're looking for.

INSERT INTO Table_A 
    (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
    colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
    Table_B

As pointed out by ypercube and in the Postgres docs, you can improve this query by adding ORDER BY. Without it, it seems that the choice between two conflicting rows is unpredictable.

INSERT INTO Table_A 
    (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
    colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
    Table_B
ORDER BY
    colA, colB, colC, columnD, colE    -- needed as it is
--  , colX, colY                       -- to choose which row to pick
  ;