The formal definition of a Primary Index and Primary Key

indexterminology

The below is from Fundamentals of Database Systems (7th Edition)

17.1 Types of Single-Level Ordered Indexes

There are several types of ordered indexes. A primary index is specified on the ordering key field of an ordered file of records. Recall from Section 16.7 that an ordering key field is used to physically order the file records on disk, and every record has a unique value for that field. If the ordering field is not a key field—that is, if numerous records in the file can have the same value for the ordering field—another type of index, called a clustering index, can be used. The data file is called a clustered file in this latter case. Notice that a file can have at most one physical ordering field, so it can have at most one primary index or one clustering index, but not both. A third type of index, called a secondary index, can be specified on any nonordering field of a file. A data file can have several secondary indexes in addition to its primary access method. We discuss these types of single-level indexes in the next three subsections.

17.1.1 Primary Indexes

A primary index is an ordered file whose records are of fixed length with two fields, and it acts like an access structure to efficiently search for and access the data records in a data file. The first field is of the same data type as the ordering key field—called the primary key—of the data file, and the second field is a pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each block in the data file. Each index entry has the value of the primary key field for the first record in a block and a pointer to that block as its two field values.

Specifically my question is about a "primary index" as it relates to a more formal academic parlance — I never went to a university so I have no idea how they're using this term. Is the "primary index is specified on the ordering key field of an ordered file of records" correct? Was that the original distinction between a PRIMARY key (for a primary index), and a regular key into a non-primary non-clustered index?

I'm thinking that this book didn't go through 7 editions at $151.00 a pop with the false idea that a primary key necessitate "ordered file of records."

Best Answer

A "key" is a logical construct used to identify a specific record. Given their purpose, keys require uniqueness. Primary Key fields are also required to be declared as NOT NULL. There are also "alternate" keys which allow key fields to be declared as NULL. Both Primary and Alternate keys have an optional, secondary purpose of establishing relationships via Foreign Keys (declarative referential integrity).

An "index" is a physical construct used to access one-or-more records. Uniqueness is not required for this purpose. When an index is declared as unique, it is not required that the indexed column(s) be declared as NOT NULL; NULL is allowed so long as it is unique, or a unique combination when working with multiple index columns.

Given the two quoted paragraphs, here is a terminology translation table:

  • BOOKSQL Server
  • "file" → Table or Index
  • "primary index" → UNIQUE CLUSTERED Index
  • "clustering index" → non-unique CLUSTERED Index (due to the cluster key column(s) being the row identifier, there is a need for true uniqueness, and so SQL Server will guarantee that uniqueness by adding a hidden, auto-incrementing INT value called the uniqueifier, but only for actual non-unique key values: please see Clustered Index Uniquifier Existence and Size for details)
  • "secondary index" → NONCLUSTERED Index
  • "primary key" → PRIMARY KEY defined on a UNIQUE CLUSTERED Index

In SQL Server, a Primary Key does not need to be defined on a UNIQUE CLUSTERED Index, it just requires a Unique Index (though the default behavior when creating a PK when no clustered Index exists is to make the PK clustered).

Also, a Foreign Key can reference either a Primary Key or an "Alternate" Key (i.e. Unique Constraint or Unique Index)

The following example shows both a PK on a NONCLUSTERED Index, and an FK referencing an Alternate Key defined on a column allowing NULL values:

USE [tempdb];

-- DROP TABLE dbo.Parent
CREATE TABLE dbo.Parent
(
  [ParentID] INT NOT NULL
             CONSTRAINT [PK_Parent] PRIMARY KEY
             NONCLUSTERED,
  [AltKey]   INT NULL
             CONSTRAINT [UQ_Parent_AltKey] UNIQUE NONCLUSTERED
);

-- DROP TABLE dbo.ChildID
CREATE TABLE dbo.ChildID
(
  [ChildID]  INT NOT NULL
             CONSTRAINT [PK_Child] PRIMARY KEY
             NONCLUSTERED,
  [AltKey]   INT NULL
             CONSTRAINT [FK_Child_Parent_AltKey] FOREIGN KEY
             REFERENCES dbo.Parent ([AltKey])
);