Postgresql – How to Choose Between VALUES and SELECT for INSERT

performancepostgresqlpostgresql-performanceselecttrigger

This answer raised the question for me how to choose between VALUES and SELECT in such a function . Using PostgreSQL 9.4.3 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit:

CREATE OR REPLACE FUNCTION insaft_function()
   RETURNS TRIGGER AS
$func$
BEGIN     
   INSERT INTO file_headers (measurement_id, file_header_index_start
                                           , file_header_index_end)
   VALUES (NEW.measurement_id, TG_ARGV[0]::int, TG_ARGV[1]::int);

   RETURN NULL;  -- result ignored since this is an AFTER trigger
END
$func$ LANGUAGE plpgsql;

VALUES work with many rows but with SELECT you can do much more.
The only requirement here is to do the above INSERT to the table.
You can assume that there are 100k of such INSERTs done per cycle in continuous quality assurance of a system.

I noticed these differences with my data where selected three median values here:

VALUES
real      user      sys
-------------------------------
0m0.353s  0m0.256s  0m0.028s
0m0.327s  0m0.252s  0m0.036s
0m0.358s  0m0.252s  0m0.040s
so average real 0.34s

SELECT
real      user      sys
-------------------------------
0m0.362s  0m0.256s  0m0.024s
0m0.383s  0m0.236s  0m0.056s
0m0.356s  0m0.264s  0m0.032s
so average real 0.36s

So this small subset of data says that VALUES is faster with such a simple INSERT. I am interested in requirements for concurrent processes and real time data analysis.

How can you decide between SELECT and VALUES for INSERT?

Best Answer

The measured difference is almost certainly noise. Run some more iterations, you won't get consistent result. The difference in performance (if any exists) won't be measurable.

You can use either method here. Both are equally good for the purpose. There are often multiple ways in SQL. And sometimes there is no clear winner.

The more important issue concerning performance here:

100k of such INSERTs done per cycle

For big bulk-inserts it would be faster to INSERT in both tables instead of firing a trigger for every row.

If you are using a serial PK that's generated automatically, you can employ the RETURNING clause in a data-modifying CTE

WITH ins1 AS (
   INSERT INTO measurement (measurement)
   VALUES ...   -- OR SELECT ... if data comes from inside the DB :)
   RETURNING measurement_id  -- generating a serial ID?
   )
INSERT INTO file_headers (measurement_id, file_header_index_start, file_header_index_end)
SELECT measurement_id,  1, 666  -- here it *must* be SELECT
FROM   ins1;

A numerical constant like 666 (no quotes, just digits) defaults to type integer automatically.

May not be applicable, depending on your workflow.

Bulk insert in multiple tables

Related Solutions

Postgresql – What’s the difference between INSERT … SELECT and SELECT INTO

SELECT INTO is usually used to select specific set of data into a table, esp., during scenarios when the data in the table is the priority and not the constraints. It automatically creates a table if there is no such table already. But, INSERT INTO is used when you already have a table that has specific defined constraints and need to add data from a different table.

Postgresql – SELECT in trigger function in two tables

You have an unresolved naming conflict.

You must be using an old version of Postgres without declaring it. Or you are operating with non-default configuration setting.

Here you declare a variable named measurement_id:

DECLARE
    measurement_id              INTEGER;

It's a folly to use ambiguous variable names to begin with. If you do it anyway, you must know what you are doing. I make it a habbit to prepend variable names with an underscore unlike column names, like _measurement_id.

The later SELECT statement is ambiguous:

ORDER BY measurement_id

This would raise an error message in modern PostgreSQL with default configuration. Per the documentation:

By default, PL/pgSQL will report an error if a name in a SQL statement could refer to either a variable or a table column.

And:

To change this behavior on a system-wide basis, set the configuration parameter plpgsql.variable_conflict to one of error, use_variable, or use_column (where error is the factory default). This parameter affects subsequent compilations of statements in PL/pgSQL functions, but not statements already compiled in the current session. Because changing this setting can cause unexpected changes in the behavior of PL/pgSQL functions, it can only be changed by a superuser.

In Postgres older than 9.0 this would be resolved to mean the variable. Per the documentation

In such cases you can specify that PL/pgSQL should resolve ambiguous references as the variable (which is compatible with PL/pgSQL's behavior before PostgreSQL 9.0)

Bold emphasis mine.
This would result in arbitrary results, since the sort order is now undetermined.

Audited Function

CREATE OR REPLACE FUNCTION insaft_function()
   RETURNS TRIGGER AS
$func$
DECLARE
   _measurement_id          integer;
   _file_header_index_start integer := TG_ARGV[0]::int;
   _file_header_index_end   integer := TG_ARGV[1]::int; 
BEGIN     

   SELECT a.measurement_id   INTO _measurement_id
   FROM   measurements a
   ORDER  BY a.measurement_id DESC  -- you had ambiguity here!
   LIMIT  1;

   IF TG_OP = 'INSERT' THEN  -- noise if only used in AFTER INSERT trigger
      INSERT INTO file_headers (measurement_id, file_header_index_start
                                              , file_header_index_end)
      VALUES (_measurement_id, _file_header_index_start, _file_header_index_end); 
   END IF;

   RETURN NULL; -- result is ignored since this is an AFTER trigger
END
$func$ LANGUAGE plpgsql;

Note how I named it insaft_function(), since this is only to be used in an AFTER INSERT trigger.

Trigger:

CREATE TRIGGER insaft_measurement_ids
AFTER INSERT ON measurements
FOR EACH ROW EXECUTE PROCEDURE insaft_function(1, 666);

But for the provided setup, you can radically simplify the function:

CREATE OR REPLACE FUNCTION insaft_function()
   RETURNS TRIGGER AS
$func$
BEGIN     
   INSERT INTO file_headers (measurement_id, file_header_index_start
                                           , file_header_index_end)
   VALUES (NEW.measurement_id, TG_ARGV[0]::int, TG_ARGV[1]::int);

   RETURN NULL;  -- result ignored since this is an AFTER trigger
END
$func$ LANGUAGE plpgsql;

Best Answer

Related Solutions

Postgresql – What’s the difference between INSERT … SELECT and SELECT INTO

Postgresql – SELECT in trigger function in two tables

Audited Function

Related Question