Sql-server – Do all Relational DBMSes store table tuples in a clustered index based on primary key by default

MySQLoraclepostgresqlsql serversqlite

So i was reading MySQL innoDB, which apparently stores table data by default on a clustered Index (b+tree) based on primary key, and the tuples are in the leafs of that b+tree

https://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/

i was wondering do all the famous Relational DBMSes like PostgreSQL, Oracle, SQL server, SQLite, store it the same way? by making a clustered Index (b+tree) based on primary key and store data on the leafs of tree?

and does Clustered index on primary key in databases basically mean storing the table based on primary key on a B+tree table?

(sorry if this is a general question but i cant make 4-5 seperate questions asking the same for each of the databases i mentioned, if you only know about one or few of them please tell)

also if its a database does not use clustered by default, can you explain how does it structure the files, for example do they make a b+tree on primary key which the leafs point to some sort of address inside a heap or…?

EDIT :

so far we got the answers for all of them except SQLite, if anyone have any info on how tables are actually stored in SQLite by default please do tell.

Best Answer

Do all the famous Relational DBMSes like PostgreSQL, Oracle, SQL server, SQLite, store it the same way? by making a clustered Index (b+tree) based on primary key and store data on the leafs of tree?

No, not all. Lets take them one by one:

  • MySQL. MySQL has several "engines" and depending on what engine a table is defined to use, the storage is:

    • InnoDB: Yes, what you describe, the table data is stored in a clustered index, with the index based on the primary key columns on the table (and if there isn't a PK defined, on the first UNIQUE index with non-null columns and in the lack of that, on a secret 6-byte internal column).
    • MyISAM: No, the table is a heap, not a clustered index.
    • other engines: (NBD, Blackhole, CSV, etc) No, I think none of them uses a clustered index (except maybe NBD, not sure)
    • TokuDB: Yes. Similar to InnoDB but you may define more than one clustered indexes!
  • SQL Server: Yes, the default behaviour for tables that have a PK is to be clustered on the PK. This can be overridden though by declaring that the PK is NONCLUSTERED. You can also define another index (not the PK) to be the clustered index of the table. If the PK is deined as NONCLUSTERED and none of the indexes is defined as CLUSTERED, then the table is a heap. In recent versions, a 3rd option was added (besides clustered index and heap): columnstore which is a different way of organizing the table data.

  • PostgreSQL: No. All the tables are heaps, period. You can create additional indexes of various types (btree, hash, gin, gist, brin, etc) but the table data are stored in a heap.

  • Oracle: No. By default the tables are heaps unless created as Index Organized Tables (Oracle's term for clustered indexes).


Additional clarifications about table data organization:

  • Clustered Index: There is a btee+ index - based on some column(s) - and the leafs contain the table's data.

  • Heap: There is no btree index for the heap (other indexes like the PK may still use a btree). The data are stored as unordered lists of records. When a new row is inserted, it is usually added in a disk page with available space. Rows are referenced by some special Page/RowID (the details are implementation dependend and surely differ from DBMS to DBMS), so other indexes will include this reference.

  • Columnstore: Types of structure where data are stored as columns rather than rows.

Some more details can be found in the documentation of each DBMS and in: - Wikipedia: Database Storage Structures - Wikipedia: Columnstore databases