PostgreSQL How to DEFAULT Partitioned Identity Column

identitypartitioningpostgresqlpostgresql-11

PostgreSQL 11
What is the best way to generate default values for identity columns on partition tables.
E.g

CREATE TABLE data.log
(
  id              BIGINT GENERATED ALWAYS AS IDENTITY
                  (
                    INCREMENT BY 1
                    MINVALUE -9223372036854775808
                    MAXVALUE 9223372036854775807
                    START WITH -9223372036854775808
                    RESTART WITH -9223372036854775808
                    CYCLE
                  ),
  epoch_millis    BIGINT NOT NULL,
  message         TEXT NOT NULL

) PARTITION BY RANGE (epoch_millis);

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

If I do:

INSERT INTO data.foo_log (epoch_millis, message)
VALUES (1000000, 'hello');

I get:

ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 1000000, hello).
SQL state: 23502

because the default generated value is not applied to the partition UNLESS I insert it into the root table like this:

INSERT INTO data.log (epoch_millis, message)
VALUES (1000000, 'hello');

There are times though that I want to insert directly into a specific partition for performance reasons (like doing bulk COPY).
The only way I can get this to work is to create the partition while knowing about the sequence that was implicitly created for the identity column like this:

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  id DEFAULT nextval('data.log_id_seq'),
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

Is there a better way to do this and if so how?

Best Answer

I don't know of a better solution in general. A few minor things, though:

pg_get_serial_sequence()

If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence().

SELECT pg_get_serial_sequence('data.log', 'id');

You might even use the expression in the CREATE TABLE script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...

COPY overrides GENERATED ALWAYS, but not trigger

Defining your id column as GENERATED ALWAYS AS IDENTITY has the effect that you are never allowed to provide user values for the column id in INSERT statements, even when using the override clause like:

INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE
VALUES (1000000, 'hello');

It would have to be GENERATED BY DEFAULT for this to work, or omit id from the INSERT completely. The manual:

OVERRIDING USER VALUE

If this clause is specified, then any values supplied for identity columns defined as GENERATED BY DEFAULT are ignored and the default sequence-generated values are applied.

This clause is useful for example when copying values between tables. Writing INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1 will copy from tbl1 all columns that are not identity columns in tbl2 while values for the identity columns in tbl2 will be generated by the sequences associated with tbl2.

COPY still overrides in any case. The manual:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

But while writing to a partition directly, with your solution, INSERT also overrides, so it will be your responsibility to avoid providing user values for the id column directly. An alternative would be to use a trigger instead of the default value in the partition:

CREATE OR REPLACE FUNCTION trg_log_default_id()
  RETURNS trigger AS
$func$
BEGIN
   NEW.id := nextval('data.log_id_seq')
   RETURN NEW;
END
$func$  LANGUAGE plpgsql;

CREATE TRIGGER insbef_default_id
  BEFORE INSERT ON data.foo_log  -- the partition
  FOR EACH ROW
  EXECUTE PROCEDURE trg_log_default_id();

This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS behavior of the parent - stricter, even, also preventing COPY from violating your rule. The manual:

COPY FROM will invoke any triggers and check constraints on the destination table.

But the trigger is a bit more expensive than a plain default value. And it would burn an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)