Postgresql – Join multiple PostgreSQL tables that have both common and individual columns

postgresql

Processing speed matters not for my needs. What I want to do is meant to be done once and then stay that way. It should be an automated process though, which can be performed by sending SQL commands to the database with psql.exe.

General problem

I have several tables that I want to merge into a single table. Assume that I can't know the names of all the columns (there are A LOT of them). What I do know is that some columns are the same for every table while other columns are individual for each table. If necessary I can specify the columns that are common to all tables, but I can't specify the columns which are common only to a few tables.

You may assume that I know the names of all the tables that I want to merge. However, a solution that doesn't assume that would be superior.

If you can give me an answer based on this then I may be able to proceed to solving the specifics of my problem on my own. However, I'm adding the specifics in case you have knowledge that may help based on that.

Specifics

The tables that I want to merge into one were generated by ogr2ogr (GDAL 2.1.0) from data in the S-57 format. I've already imported the base .000 file and updated with the available .00x files. I'm currently testing with only one .000 file, but I intend to append the import with data from several files. This shouldn't really affect the solution unless I've missed some ogr2ogr option that automatically imports everything into a single table.

I asked a different but related question on the GIS stackexchange.

Problems I've faced

I can't use UNION because the tables don't have the same columns. I also tried FULL OUTER JOIN, but it doesn't work because some columns are defined in multiple tables.


Clarifications

  • I want to merge several tables into one.
  • These tables have some columns in common. I can name these if needed.
  • These tables have some columns that are not in common. There's too many of these for me to name.
  • The new table should have all the columns, but no duplicate columns.
  • I want all rows from all the original tables inserted into the new table.

A solution where all the tables to be merged are known is good. A solution for any number of tables with unknown names is better.

The specifics of my problem is that I'm working with the S-57 format and importing to the database using ogr2ogr. These specifics are not necessarily important for the solution, but I included it in the question because someone might know something that I've missed that could have solved this already in the import.

I've tried using UNION and FULL OUTER JOIN, but it did not work.

Best Answer

Community wiki answer:

SELECT * FROM a 
NATURAL JOIN b 
NATURAL JOIN c
...
NATURAL JOIN z;

...will give you a result, without any prior knowledge of column names.

I got it to work as intended by writing

SELECT * INTO new_table 
FROM a NATURAL FULL OUTER JOIN b;

I needed the FULL OUTER part for the rows to actually be inserted into the new table.

This can be done for X number of tables in the following way, as described in this answer:

CREATE TABLE new_table ();
DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT table_name FROM information_schema.tables
         WHERE table_schema = 'public' AND table_type = 'BASE TABLE' AND table_name != 'new_table' AND table_name != 'spatial_ref_sys'
    LOOP
        EXECUTE 'CREATE TEMP TABLE temp_table AS SELECT * FROM ' || quote_ident(r.table_name) || ' NATURAL FULL OUTER JOIN new_table';
        EXECUTE 'DROP TABLE new_table CASCADE';
        EXECUTE 'CREATE TABLE new_table AS SELECT * FROM temp_table';
        EXECUTE 'DROP TABLE temp_table CASCADE';
        EXECUTE 'DROP TABLE ' || quote_ident(r.table_name) || ' CASCADE';
    END LOOP;
END$$;