PostgreSQL How to DEFAULT Partitioned Identity Column

identitypartitioningpostgresqlpostgresql-11

PostgreSQL 11
What is the best way to generate default values for identity columns on partition tables.
E.g

CREATE TABLE data.log
(
  id              BIGINT GENERATED ALWAYS AS IDENTITY
                  (
                    INCREMENT BY 1
                    MINVALUE -9223372036854775808
                    MAXVALUE 9223372036854775807
                    START WITH -9223372036854775808
                    RESTART WITH -9223372036854775808
                    CYCLE
                  ),
  epoch_millis    BIGINT NOT NULL,
  message         TEXT NOT NULL

) PARTITION BY RANGE (epoch_millis);

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

If I do:

INSERT INTO data.foo_log (epoch_millis, message)
VALUES (1000000, 'hello');

I get:

ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 1000000, hello).
SQL state: 23502

because the default generated value is not applied to the partition UNLESS I insert it into the root table like this:

INSERT INTO data.log (epoch_millis, message)
VALUES (1000000, 'hello');

There are times though that I want to insert directly into a specific partition for performance reasons (like doing bulk COPY).
The only way I can get this to work is to create the partition while knowing about the sequence that was implicitly created for the identity column like this:

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  id DEFAULT nextval('data.log_id_seq'),
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

Is there a better way to do this and if so how?

Best Answer

I don't know of a better solution in general. A few minor things, though:

`pg_get_serial_sequence()`

If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence().

SELECT pg_get_serial_sequence('data.log', 'id');

You might even use the expression in the CREATE TABLE script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...

`COPY` overrides `GENERATED ALWAYS`, but not trigger

Defining your id column as GENERATED ALWAYS AS IDENTITY has the effect that you are never allowed to provide user values for the column id in INSERT statements, even when using the override clause like:

INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE
VALUES (1000000, 'hello');

It would have to be GENERATED BY DEFAULT for this to work, or omit id from the INSERT completely. The manual:

OVERRIDING USER VALUE

If this clause is specified, then any values supplied for identity columns defined as GENERATED BY DEFAULT are ignored and the default sequence-generated values are applied.

This clause is useful for example when copying values between tables. Writing INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1 will copy from tbl1 all columns that are not identity columns in tbl2 while values for the identity columns in tbl2 will be generated by the sequences associated with tbl2.

COPY still overrides in any case. The manual:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

But while writing to a partition directly, with your solution, INSERT also overrides, so it will be your responsibility to avoid providing user values for the id column directly. An alternative would be to use a trigger instead of the default value in the partition:

CREATE OR REPLACE FUNCTION trg_log_default_id()
  RETURNS trigger AS
$func$
BEGIN
   NEW.id := nextval('data.log_id_seq')
   RETURN NEW;
END
$func$  LANGUAGE plpgsql;

CREATE TRIGGER insbef_default_id
  BEFORE INSERT ON data.foo_log  -- the partition
  FOR EACH ROW
  EXECUTE PROCEDURE trg_log_default_id();

This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS behavior of the parent - stricter, even, also preventing COPY from violating your rule. The manual:

COPY FROM will invoke any triggers and check constraints on the destination table.

But the trigger is a bit more expensive than a plain default value. And it would burn an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)

Related Solutions

Sql-server – MDW performance_counter_instances table running out of identity values

When you exhaust the upper bound of INT you will receive, for every new insert:

Msg 8115, Level 16, State 1, Line 1
Arithmetic overflow error converting IDENTITY to data type int.
Arithmetic overflow occurred.

Outside of the MDW use case

Converting to BIGINT is far safer (IMHO) than what some people do - go back to 0 and fill in the gaps, or go to -2 billion and just delay dealing with it until you use the same number of values one more time. If 2 billion rows isn't enough, neither is 4 billion. If you use compression changing the data type for values < 2 billion will actually save you some space.

However you will come across issues if this is part of a primary key constraint, referenced by foreign keys, has other constraints, etc. You'll need to perform some extra work in addition to just changing the data type.

For MDW specifically

The problem with the solution above, for MDW specifically, is that the data collector package (SSIS Packages/Data Collector/PerfCountersUpload) includes some logic that relies on the underlying data type to be INT. So, the collector jobs fail because of this mismatch.

This should buy you some time:

DBCC CHECKIDENT('snapshots.performance_counter_instances', RESEED, -2147483648);

And between now and when you start approaching 0 again (you should set up some kind of monitoring to check the max value and send some kind of alert when you get close), go in and clean up and make sure there are no positive values.

Then, when you start approaching 2 billion again, clean out all the negative values, and reseed.

Is this annoying? Absolutely. I think this might be one of several reasons why you don't see a very large MDW adoption.

Sql-server – Sudden PRIMARY KEY violation on IDENTITY column

Since the question states SQL Server 2012 RTM (build 2100) is in use, it is likely this bug:

FIX: Sequence object generates duplicate sequence values when SQL Server 2012 or SQL Server 2014 is under memory pressure

which says:

Assume that you create a sequence object that has the CACHE option enabled in Microsoft SQL Server 2012 or SQL Server 2014. When the instance is under memory pressure, and multiple concurrent connections request sequence values from the same sequence object, duplicate sequence values may be generated. In addition, a unique or primary key (PK) violation error occurs when the duplicate sequence value is inserted into a table.

Note that IDENTITY uses the sequence object mechanism in SQL Server 2012 and later.

The issue was first fixed in:

Cumulative Update 6 for SQL Server 2014
Cumulative Update 4 for SQL Server 2012 SP2

Best Answer

pg_get_serial_sequence()

COPY overrides GENERATED ALWAYS, but not trigger

Related Solutions

Sql-server – MDW performance_counter_instances table running out of identity values

Sql-server – Sudden PRIMARY KEY violation on IDENTITY column

Related Question

`pg_get_serial_sequence()`

`COPY` overrides `GENERATED ALWAYS`, but not trigger