PostgreSQL 10 – Understanding Identity Columns

identitypostgresqlpostgresql-10

I was reviewing the commit-fest scheduled for 7/01 for PostgreSQL and I saw that Pg is likely going to get "identity columns" sometime soon.

I found some mention in information_schema.columns but nothing much

is_identity         yes_or_no         Applies to a feature not available in PostgreSQL
identity_generation character_data    Applies to a feature not available in PostgreSQL
identity_start      character_data    Applies to a feature not available in PostgreSQL
identity_increment  character_data    Applies to a feature not available in PostgreSQL
identity_maximum    character_data    Applies to a feature not available in PostgreSQL
identity_minimum    character_data    Applies to a feature not available in PostgreSQL
identity_cycle      yes_or_no         Applies to a feature not available in PostgreSQL

The Wikipedia Page doesn't say much either

An identity column differs from a primary key in that its values are managed by the server and usually cannot be modified. In many cases an identity column is used as a primary key; however, this is not always the case.

But, I don't see anything else on them. How do identity columns work? Do they provide any new functionality or is this just a standard method to create sequences? Any breakdown of the new feature and how it works?

Best Answer

This is to implement the feature found in the standard. (copied from a draft, date: 2011-12-21):

4.15.11 Identity columns

The columns of a base table BT can optionally include not more than one identity column. The declared type of an identity column is either an exact numeric type with scale 0 (zero), INTEGER for example, or a distinct type whose source type is an exact numeric type with scale 0 (zero). An identity column has a start value, an increment, a maximum value, a minimum value, and a cycle option. ...
... The definition of an identity column may specify GENERATED ALWAYS or GENERATED BY DEFAULT.

It is a property of a column which basically says that the values for the column will be provided by the DBMS and not by the user and in some specific manner and restrictions (increasing, decreasing, having max/min values, cycling if the max/min value is reached).

Sequence generators (usually called just "sequences") are a related SQL standard feature: it's a mechanism that provides such values - and can be used for identity columns.

Note the subtle difference: a SEQUENCE is an object that can be used to provide values for one or more identity columns or even at will.

The various DBMS have so far implemented similar features in different ways and syntax (MySQL: AUTO_INCREMENT, SQL Server: IDENTITY (seed, increment), PostgreSQL: serial using SEQUENCE, Oracle: using triggers, etc) and only recently added sequence generators (SQL Server in version 2012 and Oracle in 12c).

Up to now Postgres has implemented sequence generators (which can be used to provide values for column, either with the special macros serial and bigserial or with nextval() function) but has not yet implemented the syntax for identity columns, as it is in the standard.

Defining identity columns (and the slight difference from serial columns) and various syntax (eg. GENERATED ALWAYS, NEXT VALUE FOR, etc) from the SQL standard is what this feature is about. Some changes / improvements may need to be done on the implementation of sequences as well, as identity columns will be using sequences.

If you follow the link identitity columns (from the page you saw), you'll find:

identity columns

From: Peter Eisentraut
To: pgsql-hackers Subject: identity columns
Date: 2016-08-31 04:00:42
Message-ID: 6adbacbf-73bc-dd1a-2033-63409180fd18@2ndquadrant.com

Here is another attempt to implement identity columns. This is a standard-conforming variant of PostgreSQL's serial columns. It also fixes a few usability issues that serial columns have:

need to set permissions on sequence in addition to table (*)

CREATE TABLE / LIKE copies default but refers to same sequence

cannot add/drop serialness with ALTER TABLE

dropping default does not drop sequence

slight weirdnesses because serial is some kind of special macro

(*) Not actually implemented yet, because I wanted to make use of the NEXT VALUE FOR stuff I had previously posted, but I have more work to do there.

...

Update 2017, September: seems like the feature will be in Postgres 10, which is to be released in a few days/weeks: What's New In Postgres 10: Identity Columns

Oracle have also implemented identity columns and sequences, in version 12c. The syntax is according to the standard, as far as I checked:
Identity Columns in Oracle Database 12c Release 1 (12.1)

The 12c database introduces the ability define an identity clause against a table column defined using a numeric type. The syntax is show below.
GENERATED
[ ALWAYS | BY DEFAULT [ ON NULL ] ]
AS IDENTITY [ ( identity_options ) ]

Related Solutions

Sql-server – SQL Server equivalent to “OVERRIDING USER VALUE”

In SQL Server you actually shouldn't need to make any changes to the outlying code or add an INSTEAD OF trigger to make this work. Here is a quick example, tested on SQL Server 2012, but should work fine on 2005 as well:

CREATE TABLE dbo.source(bar INT, x UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID());
GO
CREATE TABLE dbo.[target](bar INT, x UNIQUEIDENTIFIER PRIMARY KEY);
GO

INSERT dbo.source(bar) SELECT 1;
GO
INSERT dbo.[target] SELECT * FROM dbo.source;
GO

ALTER TABLE dbo.[target] ADD TargetID INT IDENTITY(1,1);
GO

TRUNCATE TABLE dbo.source;
INSERT dbo.source(bar) SELECT 1;
GO
INSERT dbo.[target] SELECT * FROM dbo.source;
GO

SELECT * FROM dbo.[target];

Results:

bar   x                                      TargetID
---   ------------------------------------   --------
1     EFF8DAC4-FB3E-4734-80BE-6DC229846203   1
1     5036688D-C04A-45FC-920E-FF44D7D501D1   2

Now I can also change the primary key and repeat the process:

ALTER TABLE [dbo].[target] DROP CONSTRAINT PK__target__3BD019E50386B4EA;
ALTER TABLE [dbo].[target] ADD CONSTRAINT PK__target__3BD019E50386B4EA  
  PRIMARY KEY (targetID);

TRUNCATE TABLE dbo.source;
INSERT dbo.source(bar) SELECT 1;
GO
INSERT dbo.[target] SELECT * FROM dbo.source;
GO

SELECT * FROM dbo.[target];

Results:

bar   x                                      TargetID
---   ------------------------------------   --------
1     EFF8DAC4-FB3E-4734-80BE-6DC229846203   1
1     5036688D-C04A-45FC-920E-FF44D7D501D1   2
1     41FE97FF-7D45-46EB-8A0D-B2C3BA1E67EA   3

So I haven't had to change my bad code that uses insert/select without any column lists, as long as the source table doesn't also change and assuming that the only change to the target is the addition of an identity column.

PS here is how you can automate the generation of the IDENTITY columns (assuming you will want <tablename>ID):

DECLARE @sql NVARCHAR(MAX);

SET @sql = N'';

SELECT @sql = @sql + '
  ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(t.[object_id]))
  + '.' + QUOTENAME(t.name) + ' ADD ' + t.name + 'ID INT IDENTITY(1,1);'
FROM sys.tables AS t
WHERE name IN (...) -- you will need to fill in this part
AND NOT EXISTS 
(
  SELECT 1 FROM sys.columns WHERE [object_id] = t.[object_id] 
    AND (is_identity = 1 OR name = t.name + 'ID')
);

SELECT @sql;
-- EXEC sp_executesql @sql;

(Note that the SELECT output will show you roughly what the command looks like, but due to output limitations in SSMS and depending on how many tables you have, it won't necessarily show you the full command that will get executed when you uncomment the EXEC.)

And the drop / re-create of the primary keys:

DECLARE @sql NVARCHAR(MAX);

SET @sql = N'';

SELECT @sql = @sql + '
  ALTER TABLE ' + 
  + QUOTENAME(OBJECT_SCHEMA_NAME(t.[object_id]))
  + '.' + QUOTENAME(t.name) + ' DROP CONSTRAINT ' + k.name + ';
  ALTER TABLE ' + 
  + QUOTENAME(OBJECT_SCHEMA_NAME(t.[object_id]))
  + '.' + QUOTENAME(t.name) + ' ADD CONSTRAINT ' 
  + k.name + ' PRIMARY KEY (' + t.name + 'ID);' 
FROM sys.key_constraints AS k
INNER JOIN sys.tables AS t
ON k.parent_object_id = t.[object_id]
WHERE k.[type] = 'PK'
AND t.name IN (...); -- again, you'll want to identify the list of tables

SELECT @sql;
-- EXEC sp_executesql @sql;

And you'll want to do this while the database is in SINGLE_USER mode or while the application(s) are otherwise not able to connect to the database. You'll also want to test all this on a QA or dev system before unleashing any of it on production.

Now, this still isn't exactly best practice - I highly recommend you stop embedding SQL code in your apps, especially SQL code that does insert/select without specifying column lists.

PostgreSQL – How to Retrieve Max Primary Key Column to Insert a New Row

Postgres has the serial datatype which matches SQL Server's IDENTITY or MySQL's AUTO_INCREMENT.

Internally it is shorthand for a SEQUENCE but does that matter? It acts like IDENTITY/AUTO_INCREMENT:

The data types serial and bigserial are not true types, but merely a notational convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property supported by some other databases). In the current implementation, specifying:

CREATE TABLE tablename (
    colname SERIAL
);

is equivalent to specifying:

CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
    colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;

Edit,

I think what OP means is "is there SCOPE_IDENTITY or such" in PostgreSQL. Yes. You'd need currval or another one

Best Answer

Related Solutions

Sql-server – SQL Server equivalent to “OVERRIDING USER VALUE”

PostgreSQL – How to Retrieve Max Primary Key Column to Insert a New Row

Related Question