PostgreSQL insert into table (not origin) based on a condition on fields on different tables

postgresqlpostgresql-9.1

Lets say you have 4 different tables:

table_1 is a table holding the original data:

table_1

id1  |  id2  |  score1  |  score2  |  name_1         |  name_2
-----+-------+----------+----------+-----------------+---------
123  | 89898 | 0.5      | 0.8      |Jack             |Joe
129  | 89999 | 0.55     | 0.3      |Chris            |Don
150  | 90098 | 0.8      | 0.1      |Susan            |Anna
170  | 99898 | 0.7      | 0.14     |Ben              |Lisa

table_2 is a table where we would like to insert fields from table_1 (id1, id2, score1,score2) plus a comment string based on a condition on table_3 and table_4 (where both fields name_1 and name_2 are empty)

table_2 (the result table) should look like:

id1  |  id2  |  score1  |  score2  |  comment_string
-----+-------+----------+----------+-----------------
129  | 89999 | 0.55     | 0.3      | 'removed_because_no_value_in_fields_name_1_and_name_2'

table_3 and table_4 are tables which are holding data of two colomns from table_1 and some rows that have been modified previously. The modified fields are empty. The amount of rows is the same like in the origin table table_1.

table_3:

name_1  |
--------+
Jack
[empty]
Susan
Ben

table_4:

name_2  |
--------+
Joe
[empty]
Anna
Lisa

Note:
[empty] means that the field is empty. ids are not representing primary keys. The result table (table_2) should only contain the fields id1, id2, score1,score2 plus a comment string if the fields with the same row number in table_3 and table_4 are empty.

How to achieve this using PostgresSQL 9.1?

Best Answer

You seem to be under the impression that some kind of automatic "row number" would exist. That is not the case. Unlike rows in a spreadsheed, tables in a relational database have no natural order.

This query should do the job, but it relies on the the contents of name_1 and name_2 to make the connection. If you rely on a row number, you have to add an actual column for that.

INSERT INTO table2 (id1, id2, score1, score2, comment_string)
SELECT t1.id1, t1.id2, t1.score1, t1.score2
      ,CASE WHEN t3.name_1 IS NULL
            AND  t4.name_2 IS NULL THEN 'removed_because ...'
       ELSE END AS comment_string
FROM   table_1 t1
LEFT   JOIN table_3 t3 USING (name_1)
LEFT   JOIN table_4 t4 USING (name_2)
ORDER  BY id1; -- undeclared in Q

Based on the assumption that table_3.name_1 and table_4.name_2 are unique. Else, the query could create a "proxy cross join", possibly multiplying rows, if there are several matches.
More about this caveat in this related answer on SO.