Sql-server – Do all Relational DBMSes store table tuples in a clustered index based on primary key by default

MySQLoraclepostgresqlsql serversqlite

So i was reading MySQL innoDB, which apparently stores table data by default on a clustered Index (b+tree) based on primary key, and the tuples are in the leafs of that b+tree

https://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/

i was wondering do all the famous Relational DBMSes like PostgreSQL, Oracle, SQL server, SQLite, store it the same way? by making a clustered Index (b+tree) based on primary key and store data on the leafs of tree?

and does Clustered index on primary key in databases basically mean storing the table based on primary key on a B+tree table?

(sorry if this is a general question but i cant make 4-5 seperate questions asking the same for each of the databases i mentioned, if you only know about one or few of them please tell)

also if its a database does not use clustered by default, can you explain how does it structure the files, for example do they make a b+tree on primary key which the leafs point to some sort of address inside a heap or…?

EDIT :

so far we got the answers for all of them except SQLite, if anyone have any info on how tables are actually stored in SQLite by default please do tell.

Best Answer

Do all the famous Relational DBMSes like PostgreSQL, Oracle, SQL server, SQLite, store it the same way? by making a clustered Index (b+tree) based on primary key and store data on the leafs of tree?

No, not all. Lets take them one by one:

MySQL. MySQL has several "engines" and depending on what engine a table is defined to use, the storage is:
- InnoDB: Yes, what you describe, the table data is stored in a clustered index, with the index based on the primary key columns on the table (and if there isn't a PK defined, on the first UNIQUE index with non-null columns and in the lack of that, on a secret 6-byte internal column).
- MyISAM: No, the table is a heap, not a clustered index.
- other engines: (NBD, Blackhole, CSV, etc) No, I think none of them uses a clustered index (except maybe NBD, not sure)
- TokuDB: Yes. Similar to InnoDB but you may define more than one clustered indexes!
SQL Server: Yes, the default behaviour for tables that have a PK is to be clustered on the PK. This can be overridden though by declaring that the PK is NONCLUSTERED. You can also define another index (not the PK) to be the clustered index of the table. If the PK is deined as NONCLUSTERED and none of the indexes is defined as CLUSTERED, then the table is a heap. In recent versions, a 3rd option was added (besides clustered index and heap): columnstore which is a different way of organizing the table data.
PostgreSQL: No. All the tables are heaps, period. You can create additional indexes of various types (btree, hash, gin, gist, brin, etc) but the table data are stored in a heap.
Oracle: No. By default the tables are heaps unless created as Index Organized Tables (Oracle's term for clustered indexes).

Additional clarifications about table data organization:

Clustered Index: There is a btee+ index - based on some column(s) - and the leafs contain the table's data.
Heap: There is no btree index for the heap (other indexes like the PK may still use a btree). The data are stored as unordered lists of records. When a new row is inserted, it is usually added in a disk page with available space. Rows are referenced by some special Page/RowID (the details are implementation dependend and surely differ from DBMS to DBMS), so other indexes will include this reference.
Columnstore: Types of structure where data are stored as columns rather than rows.

Some more details can be found in the documentation of each DBMS and in: - Wikipedia: Database Storage Structures - Wikipedia: Columnstore databases

Related Solutions

Sql-server – Is ‘Avoid creating a clustered index based on an incrementing key’ a theth from SQL Server 2000 days

The myth goes back to before SQL Server 6.5, which added row level locking. And hinted at here by Kalen Delaney.

It was to do with "hot spots" of data page usage and the fact that a whole 2k page (SQL Server 7 and higher use 8k pages) was locked, rather then an inserted row Edit, Feb 2012

Found authoritative article by Kimberly L. Tripp

"The Clustered Index Debate Continues..."

Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there.

Edit, May 2013

The link in lucky7_2000's answer seems to say that hotspots can exist and they cause issues. However, the article uses a non-unique clustered index on TranTime. This requires a uniquifier to be added. Which means the index in not strictly monotonically increasing (and too wide). The link in that answer does not contradict this answer or my links

On a personal level, I have woked on databases where I inserted tens of thousands of rows per second into a table that has a bigint IDENTITY column as the clustered PK.

Sql-server – Moving primary key constraint from one index to another

The following script illustrates an efficient way to convert the existing nonclustered primary key to clustered, and to rename it:

-- How the table looks now
CREATE TABLE dbo.Example
(
    pk integer NOT NULL,
    some_data integer NOT NULL,

    CONSTRAINT PK_UnusualName
        PRIMARY KEY NONCLUSTERED (pk)
);

-- Some data
INSERT dbo.Example (pk, some_data)
VALUES (1, 100), (2, 200), (3, 300);

-- Change the nonclustered PK to clustered
CREATE UNIQUE CLUSTERED INDEX PK_UnusualName
ON dbo.Example (pk)
WITH (DROP_EXISTING = ON);

-- Rename
EXECUTE sys.sp_rename 
    @objname = N'dbo.Example.PK_UnusualName',
    @newname = N'PK__dbo_Example_pk',
    @objtype = 'INDEX';

-- Tidy up
DROP TABLE dbo.Example;

Best Answer

Related Solutions

Sql-server – Is ‘Avoid creating a clustered index based on an incrementing key’ a theth from SQL Server 2000 days

Sql-server – Moving primary key constraint from one index to another

Related Question