Postgresql – Specify INSERT columns with query

postgresql

I'd like to INSERT data from one table to another, where the destination table has all the source columns with a few others in between. The tables have a large number of columns (plus this can get repeated now and then), so I'd rather not explicitly write the column names.

Along the lines of (doesn't work):

INSERT INTO dest_table
(SELECT string_agg(column_name, ', ')
FROM information_schema.columns
WHERE table_schema = 'public'
  AND table_name   = 'source_table')
SELECT * from source_table

The context is that I'm trying to alter a column's position.

Best Answer

Altering a columns position is going to require a full table rewrite. My suggestion is not to alter column positions. However, there is an abundance of people doing this and there are numerous ways to do it. In my experience,

one way is far better and easier, if you have the privileges, and
most DBAs have those privileges.

My suggestion is to use pg_dump.

Dump with --column-inserts

Dump data as INSERT commands with explicit column names (INSERT INTO table (column, ...) VALUES ...). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents.
Remove from dump what you don't need.
Modify table definition or ordering.
In a transaction,
1. Drop the old tables
2. Execute the script creating the new tables.
3. Commit

Related Solutions

PostgreSQL – Return unique combinations of columns based on where clause

The execution plan shown does not seem to match the big SELECT DISTINCT query because the Sort and Unique steps are missing. Anyway you are correct than when retrieving ~50% of a table, index don't help. The best strategy is a big sequential scan of the main table and only fast hardware helps with that.

For the 2nd part of the question:

How would I go about selecting only the unique combinations of adjacent columns? Is this too complicated a task to perform through a database query? Would it speed up the query?

To remove duplicate combinations of adjacent columns, the structure of the resultset should be changed so that each output row has only one couple of adjacent columns along with their corresponding dimensions in the parallel coordinates graph. Well, except that the dimension for the 2nd column is not necessary since it's always the dimension for the other column plus one.

In one single query, this could be written like this:

WITH logs as (
  SELECT log_time_mapped, syslog_priority_mapped, 
     operation_mapped, message_code_mapped, protocol_mapped, 
     source_ip_mapped, destination_ip_mapped, 
     source_port_mapped, destination_port_mapped, 
     destination_service_mapped, direction_mapped, 
     connections_built_mapped, connections_torn_down_mapped, 
     hourofday_mapped, meridiem_mapped 
  FROM firewall_logs_mapped 
  WHERE operation = 'Built')
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped FROM logs
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped FROM logs
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped FROM logs
UNION ALL
...etc...
SELECT DISTINCT 14,  hourofday_mapped, meridiem_mapped FROM logs
;

The first SELECT DISTINCT subquery extracts the lines to draw between dimensions 1 and 2, the next subquery between dimensions 2 and 3, and so on. DISTINCT eliminates duplicates, so the client side doesn't have to do it. The UNION ALL concatenates the results without any further processing.

However it's a heavy query and it's dubious that it would be any faster than what you're already doing.

The WITH subquery is likely to gets slowly materialized on disk, so it might be interesting to compare the execution time with this other form repeating the same condition:

SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
...etc...
;

PostgreSQL use NEW in query for INSTEAD OF trigger

NEW is a record, not a table. Basics:

Use NEW in FROM clause in Postgres trigger?

Slightly modified setup

CREATE TABLE product (
  product_id serial PRIMARY KEY,
  product_name text UNIQUE NOT NULL  -- must be UNIQUE
);

CREATE TABLE purchase (
  purchase_id serial PRIMARY KEY,
  product_id  int REFERENCES product,
  when_bought date
);

CREATE VIEW purchaseview AS
SELECT pu.purchase_id, pr.product_name, pu.when_bought
FROM   purchase     pu
LEFT   JOIN product pr USING (product_id);

INSERT INTO product(product_name) VALUES ('foo');

product_name has to be UNIQUE, or the lookup on this column could find multiple rows, which would lead to all kinds of confusion.

1. Simple solution

For your simple example, only looking up the single column product_id, a lowly correlated subquery is simplest and fastest:

CREATE OR REPLACE FUNCTION insert_purchaseview_func()
  RETURNS trigger AS
$func$
BEGIN
   INSERT INTO purchase(product_id, when_bought)
   SELECT (SELECT product_id FROM product WHERE product_name = NEW.product_name), NEW.when_bought
   RETURNING purchase_id
   INTO   NEW.purchase_id;  -- generated serial ID for RETURNING - if needed

   RETURN NEW;
END
$func$  LANGUAGE plpgsql;

CREATE TRIGGER insert_productview_trig
INSTEAD OF INSERT ON purchaseview
FOR EACH ROW EXECUTE PROCEDURE insert_purchaseview_func();

No additional variables. No CTE (would only add cost and noise). Columns from NEW are spelled out once only (your point 1).

The appended RETURNING purchase_id INTO NEW.purchase_id takes care of your point 2: Now, the returned row includes the newly generated purchase_id.

If the product is not found (NEW.product_name does not exist in table product), the purchase is still inserted and product_id is NULL. This may or may not be desirable.

2.

To skip the row instead (and possibly raise a WARNING / EXCEPTION):

CREATE OR REPLACE FUNCTION insert_purchaseview_func()
  RETURNS trigger AS
$func$
BEGIN
   INSERT INTO purchase AS pu
            (product_id,     when_bought)
   SELECT pr.product_id, NEW.when_bought
   FROM   product pr
   WHERE  pr.product_name = NEW.product_name
   RETURNING pu.purchase_id
   INTO   NEW.purchase_id;  -- generated serial ID for RETURNING - if needed

   IF NOT FOUND THEN  -- insert was canceled for missing product
      RAISE WARNING 'product_name % not found! Skipping INSERT.', quote_literal(NEW.product_name);
   END IF;

   RETURN NEW;
END
$func$  LANGUAGE plpgsql;

This piggybacks NEW columns to SELECT .. FROM product. If the product is found, everything proceeds normally. If not, no row is returned from the SELECT and no INSERT happens. The special PL/pgSQL variable FOUND is only true if the last SQL query processed at least one row.

Could be EXCEPTION instead of WARNING to raise an error and roll back the transaction. But I'd rather declare purchase.product_id NOT NULL and insert unconditionally (query 1 or similar), to the same effect: raises an exception if product_id is NULL. Simpler, cheaper.

3. For multiple lookups

CREATE OR REPLACE FUNCTION insert_purchaseview_func()
  RETURNS trigger AS
$func$
BEGIN
   INSERT INTO purchase AS pu
            (product_id,   when_bought)     -- more columns?
   SELECT pr.product_id, i.when_bought      -- more columns?
   FROM  (SELECT NEW.*) i                   -- see below
   LEFT   JOIN product  pr USING (product_name)
-- LEFT   JOIN tbl2     t2 USING (t2_name)  -- more lookups?
   RETURNING pu.purchase_id                 -- more columns?
   INTO   NEW.purchase_id;                  -- more columns?

   RETURN NEW;
END
$func$  LANGUAGE plpgsql;

The LEFT JOINs make the INSERT unconditional again. Use JOIN instead to skip if one is not found.

FROM (SELECT NEW.*) i transforms the record NEW into a derived table with a single row, which can be used like any table in the FROM clause - what you were looking for, initially.

db<>fiddle here