The formal definition of a Primary Index and Primary Key

indexterminology

The below is from Fundamentals of Database Systems (7th Edition)

17.1 Types of Single-Level Ordered Indexes

There are several types of ordered indexes. A primary index is specified on the ordering key field of an ordered file of records. Recall from Section 16.7 that an ordering key field is used to physically order the file records on disk, and every record has a unique value for that field. If the ordering field is not a key field—that is, if numerous records in the file can have the same value for the ordering field—another type of index, called a clustering index, can be used. The data file is called a clustered file in this latter case. Notice that a file can have at most one physical ordering field, so it can have at most one primary index or one clustering index, but not both. A third type of index, called a secondary index, can be specified on any nonordering field of a file. A data file can have several secondary indexes in addition to its primary access method. We discuss these types of single-level indexes in the next three subsections.

17.1.1 Primary Indexes

A primary index is an ordered file whose records are of fixed length with two fields, and it acts like an access structure to efficiently search for and access the data records in a data file. The first field is of the same data type as the ordering key field—called the primary key—of the data file, and the second field is a pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each block in the data file. Each index entry has the value of the primary key field for the first record in a block and a pointer to that block as its two field values.

Specifically my question is about a "primary index" as it relates to a more formal academic parlance — I never went to a university so I have no idea how they're using this term. Is the "primary index is specified on the ordering key field of an ordered file of records" correct? Was that the original distinction between a PRIMARY key (for a primary index), and a regular key into a non-primary non-clustered index?

I'm thinking that this book didn't go through 7 editions at $151.00 a pop with the false idea that a primary key necessitate "ordered file of records."

Best Answer

A "key" is a logical construct used to identify a specific record. Given their purpose, keys require uniqueness. Primary Key fields are also required to be declared as NOT NULL. There are also "alternate" keys which allow key fields to be declared as NULL. Both Primary and Alternate keys have an optional, secondary purpose of establishing relationships via Foreign Keys (declarative referential integrity).

An "index" is a physical construct used to access one-or-more records. Uniqueness is not required for this purpose. When an index is declared as unique, it is not required that the indexed column(s) be declared as NOT NULL; NULL is allowed so long as it is unique, or a unique combination when working with multiple index columns.

Given the two quoted paragraphs, here is a terminology translation table:

BOOK → SQL Server
"file" → Table or Index
"primary index" → UNIQUE CLUSTERED Index
"clustering index" → non-unique CLUSTERED Index (due to the cluster key column(s) being the row identifier, there is a need for true uniqueness, and so SQL Server will guarantee that uniqueness by adding a hidden, auto-incrementing INT value called the uniqueifier, but only for actual non-unique key values: please see Clustered Index Uniquifier Existence and Size for details)
"secondary index" → NONCLUSTERED Index
"primary key" → PRIMARY KEY defined on a UNIQUE CLUSTERED Index

In SQL Server, a Primary Key does not need to be defined on a UNIQUE CLUSTERED Index, it just requires a Unique Index (though the default behavior when creating a PK when no clustered Index exists is to make the PK clustered).

Also, a Foreign Key can reference either a Primary Key or an "Alternate" Key (i.e. Unique Constraint or Unique Index)

The following example shows both a PK on a NONCLUSTERED Index, and an FK referencing an Alternate Key defined on a column allowing NULL values:

USE [tempdb];

-- DROP TABLE dbo.Parent
CREATE TABLE dbo.Parent
(
  [ParentID] INT NOT NULL
             CONSTRAINT [PK_Parent] PRIMARY KEY
             NONCLUSTERED,
  [AltKey]   INT NULL
             CONSTRAINT [UQ_Parent_AltKey] UNIQUE NONCLUSTERED
);

-- DROP TABLE dbo.ChildID
CREATE TABLE dbo.ChildID
(
  [ChildID]  INT NOT NULL
             CONSTRAINT [PK_Child] PRIMARY KEY
             NONCLUSTERED,
  [AltKey]   INT NULL
             CONSTRAINT [FK_Child_Parent_AltKey] FOREIGN KEY
             REFERENCES dbo.Parent ([AltKey])
);

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

You can store your index as a list of fixed-size offsets into the block containing your key data. For example:

+--------------+
| 3            | number of entries
+--------------+
| 16           | offset of first key data
+--------------+
| 24           | offset of second key data
+--------------+
| 39           | offset of third key data
+--------------+
| key one |
+----------------+
| key number two |
+-----------------------+
| this is the third key |
+-----------------------+

(well, the key data would be sorted in a real example, but you get the idea).

Note that this does not necessarily reflect how index blocks are actually constructed in any database. This is merely an example of how you might organise a block of index data where the key data is of variable length.

Postgresql – Postgres primary key sorted in the reverse order. Will it use the index efficiently

Indexes are typically unidirectional. This allows improvements in space and index scanning speed as back links don't need to be maintained. As @a_horse_with_no_name has noted, Postgres can and will be used to search in the reverse order to the index. (If this was not a unique index with no NULLs, positioning of NULLS FIRST/LAST could be a consideration.) Performance may not be as fast as using a non-standard index ordering, and the documentation notes significant speedups can be obtained with non-standard orderings.

For systems that don't support scanning indexes backwards, if your index is ascending, it would not be used for your query. There are two options in this case:

add another index with DESC specified; or
replace the primary key index with one which is has DESC specified. (I haven't tried this, and likely wouldn't as it breaks my expectations for primary keys.)

Best Answer

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

Postgresql – Postgres primary key sorted in the reverse order. Will it use the index efficiently

Related Question