Sql-server – Primary key guarantees: duplicates and nullity

oraclepostgresqlsql serversqlite

Oracle allows primary keys to have duplicate and null values.
Using this capability isn't a particularly good idea, but it implies that some of what most developers consider a primary key's guarantees (non-null, unique) are not really guarantees.

CREATE TABLE oracle_guarantees (
    ID NUMBER(9,0),
    NAME VARCHAR2(50 BYTE),
    breed NVARCHAR2(100)
);

INSERT INTO oracle_guarantees VALUES (1, 'Fuzz Head', 'Tabby');
INSERT INTO oracle_guarantees VALUES (2, 'Fluffy Thing', 'Mix');
INSERT INTO oracle_guarantees VALUES (2, 'Fluffy Thing', 'Mix');
INSERT INTO oracle_guarantees VALUES (3, 'Tiger', 'Tabby');
INSERT INTO oracle_guarantees VALUES (4, 'Fur Beast', 'Bengal');
INSERT INTO oracle_guarantees VALUES (5, 'Karate', 'Japanese Bobtail');
INSERT INTO oracle_guarantees VALUES (6, 'Chairman Meow', 'Chinese Harlequin');
INSERT INTO oracle_guarantees VALUES (NULL, 'No Cat', 'No breed');

CREATE INDEX oracle_guarantees_pk ON oracle_guarantees (ID);
ALTER TABLE oracle_guarantees ADD CONSTRAINT oracle_guarantees_pk PRIMARY KEY (ID) DISABLE KEEP INDEX;
ALTER TABLE oracle_guarantees MODIFY CONSTRAINT oracle_guarantees_pk ENABLE NOVALIDATE;

SQLite PKs also allow one null but not duplicate values.

Is it possible to have constraints marked as primary keys which have duplicate values or null values in Microsoft SQL or Postgres?

In other words, can I absolutely rely on PK uniqueness and non-nullity within the documented feature set (ignoring cases like manually edited data files, bugs, or modified server source code)?

Further clarification:
I suspect that DBAs and developers work a little more differently than I thought.

Thought exercise:
A developer needs to uniquely identify rows on any table (the table structure is unknown at compile time).
Oracle's documentation on primary key constraints comes up in a search and says:

A primary key constraint combines a NOT NULL constraint and a unique
constraint in a single declaration. That is, it prohibits multiple
rows from having the same value in the same column or combination of
columns and prohibits values from being null.

The developer then looks for a means to find the primary keys on any table at run time. They probably come across a question like this: https://stackoverflow.com/questions/9016578/how-to-get-primary-key-column-in-oracle

The top voted answer's query yields:

I believe that any reasonable non-DBA person would, at this point, conclude that the ID column cannot have NULL or duplicate values. This conclusion is not true.

Is it Oracle's fault? No.
Is the table properly designed? No.
Are the statements to create such a table complex? Immaterial.
Is the Id column a "real" primary key? Maybe not, but this is more a philosophical matter. ID is listed (and is shown as
"ENABLED") by the query in the top-rated SO answer to the linked question above. Perhaps that question needs to add a check for validation, but I have never once seen anyone check for this in code, nor do any answers in that thread or related threads I have found.

Ergo, in the real world, an arbitrary table can have a primary key (by definition of all_constraints.constraint_type) which has duplicate values. And NULL values.

I now know that software must also ensure that the constraint is validated. Excellent! That provides the guarantee I need.

The question is: Is it possible to have a primary key (as defined by DBMS metadata) that has NULL or duplicate values in PostgreSQL or Microsoft SQL?

Best Answer

Oracle allows primary keys to have duplicate and null values.

Not really. What you have managed to create - after a series of complicated statements - is an enabled but not validated constraint. Which means that Oracle will check inserts and updates (for uniqueness) but there may be left existing duplicates. So it is a PK only in name. It's a not-validated constraint so not really a PK.

In other words, can I absolutely rely on PK uniqueness and non-nullity within the documented feature set (ignoring cases like manually edited data files, bugs, or modified server source code)?

More context: I am working on an application that displays records and allows a user to delete them on any arbitrary table. When dealing with an arbitrary table, I need to use metadata and other means to determine which guarantees I have. Much database development is about guarantees -- guarantees that a large transaction will commit or not (but not partially commit). Guarantees that a successfully modified row stays modified. And guarantees that a primary key uniquely refers to exactly one row.

Yes, you can rely but only if your application reads the metadata and all the details - which may differ from DBMS to DBMS.

SQL Server allows for disabled constraints? The application has to consider this when reading and interpreting the metadata tables.
Oracle allows for disabled or non-validated constraints? The application has to consider those options too, accordingly.
Postgres allows for some different weird scenarios? It has to consider them, too.

Another possible option - which may or may not be an option for you - is if the application is the only application that creates, deletes and modifies database objects or if all applications that do so are under your or a single control. Then it can rely on the soundness of the metadata ("consider only unique constraints") without checking for these details / rare cases - because it can rely that no such rare case is ever created in the first place.

With your conclusions:

Is it Oracle's fault? No.

I agree. I don't know why this was allowed but it probably solves some problem. And it probably was meant to be used only temporarily - e.g. during an import from another source to a database.

Is the table properly designed? No.

Are the statements to create such a table complex? Immaterial.

I agree, too, on both.

Is the Id column a "real" primary key? Maybe not, but this is more a philosophical matter. ID is listed (and is shown as "ENABLED") by the query in the top-rated SO answer to the linked question above. Perhaps that question needs to add a check for validation, but I have never once seen anyone check for this in code, nor do any answers in that thread or related threads I have found.

We come to the same conclusion: needs to add a check for validation.

As for what other possibilities exist in SQL Server and Postgres:

SQL Server:
- PRIMARY KEY and UNIQUE constraints cannot be disabled.
- Unique indexes can be disabled. They can also be re-enabled without check which would result in a situation very similar to the "enabled non-validated" in Oracle.
- FOREIGN KEY and CHECK constraints can be disabled. They can be re-enabled without check, too.
  See ALTER TABLE and Disable indexes and constraints for details. My answer in this question: What is a WITH CHECK CHECK CONSTRAINT? explains options and the syntax, which differs from Oracle's.
PostgreSQL:
- PRIMARY KEY, UNIQUE and EXCLUDE constraints cannot be disabled.
- FOREIGN KEY and CHECK constraints cannot be disabled. They can be created with the NOT VALID option though (which means that they are enabled without checking existing rows, as in SQL Server). See ALTER TABLE for details.

Related Solutions

Sql-server – Why is Clustered Index on Primary Key compulsory

Why Primary Key constraint creates Clustered Index on the PK column by default?

That is what MS-SQL-Server programmers decided the default to be. A good clustered index is one that has unique values (as the Primary Key), is narrow (as most primary keys are or at least should be) and is ever-increasing. So, most of the times, the primary key is a good (or the best) choice for the clustered key (there can be at most one clustered key per table).

Can we create a table which has a primary key, but NO clustered index?

Yes, you can. By explicitely defining all indices and especially the primary key as non-clustered. If you think that you don't need a clustered key on a table, you can do that and have the primary key as non-clustered. The unique and not null constraints will still be enforced.

PostgreSQL – insert/update violates foreign key constraints

There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)

But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3 in your second attempt has syntax and logical errors, but the basic idea is:

CREATE TABLE table3 (   
  "ID" bigint NOT NULL DEFAULT '0',   
  "DataID" bigint DEFAULT NULL,   
  "Address" numeric(20) DEFAULT NULL,   
  "Data" bigint DEFAULT NULL,
   PRIMARY KEY ("ID"),   
   FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,   
   FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);

This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL. If a new row is inserted, check DataID and Address: if they contain a non-NULL value (say 27856), then check Table1 for DataID˙and Table2 for Address. If there is no such value in those tables, then return an error." This last point which you've seen first:

ERROR: insert or update on table "Table3" violates foreign key constraint 
    "Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".

So simple: if there is no row in Table1 where DataID = 27856, then you can't insert that row into Table3.

If you need that row, you should first insert a row into Table1 with DataID = 27856, and only then try to insert into Table3. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.

And now about the other problems.

You define your PKs as

CREATE all_your_tables (
    first_column NOT NULL DEFAULT '0',   
    [...]
    PRIMARY KEY ("ID"),

A primary key means that all the items in it are different from each other, that is, the values are UNIQUE. If you give a static DEFAULT (like '0') to a UNIQUE column, you will experience bad surprises all the time. This is what you got in your third error message.

Furthermore, '0' means a text string, but not a number (bigint or numeric in your case). Use simply 0 instead (or don't use it at all, as I written above).

And a last point (I may be wrong here): in Table2, your Address field is set to numeric(20). At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3 like

ID        DataID           Address             Data
123       3216547          1234567890454       654897564134569

Now that address happens to change to something other. How do you make your child row in Table3 follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.

Best Answer

Related Solutions

Sql-server – Why is Clustered Index on Primary Key compulsory

PostgreSQL – insert/update violates foreign key constraints

Related Question