Mariadb clustering Index design question

auto-incrementclustered-indexmariadboptimization

Here are the columns of concern for a table describing tasks:

  • Org_ID
  • Task_ID
  • Task_property_columns

The only requirement for Task_ID is that it is unique per every Org_ID, however I would prefer not handle that programatically and that it be auto incremented.

The requirement for the Org_ID is that I would like to run metrics and analytics using sort operations with the table organised by Org_ID and ported nationally some of the task properties. The table being clustered on Task_ID would be useless, but would be very beneficial to generate on Org_ID.

From reviewing the documentation, it seems that clustering is picked for you in MariaDB where it will use the primary key, a unique non-nullable index, or a background auto generated index in that order.

I am pretty new to MariaDB and was hoping for some help in picking the right solution. From what I understand I can't find a good way to auto increment and cluster how I want it to as well.

Best Answer

First, to clarify: "cluster" in this Question refers to the "clustering" of the PRIMARY KEY with the data, not the "clustering of servers".

You have provided the 3 choices that InnoDB makes to picks the clustered PK. Let me rephrase it from a user point of view.

  1. If you have a column, or combination of columns, (all non-NULL) that are inherently UNIQUE, then use it as the PK. This is a "natural" PK.
  2. If choice 1 is not available, use an AUTO_INCREMENT.
  3. Though available, do not use the third choice of an auto-generated hidden PK. Some day, you will need to uniquely identify rows, perhaps for maintenance.

Unlike some other vendors, there is no ROWNUM, nor is there a non-clustered PK. The lack of those is not important.

Secondary keys let you get at the data in other ways. A secondary key contains a copy of the PK so that it can get to the data. Both PK and secondary keys are organized as BTrees. BTree is the best all-around indexing method; the lack of Hash, etc, is not important.

A natural PK lets you do a range scan over the PK column very efficiently.

If find that two-thirds of tables have a "natural" PK; only one-third need an auto_increment instead.

Back to your 'task'. It sounds like you want

PRIMARY KEY(org_id, task_id)

but would like task_id to be automatically generated. This is possible by saying

task_id, INT UNSIGNED AUTO_INCREMENT NOT NULL

but there will be gaps -- since the auto_inc is table-wide, not specific to a given org_id. If you can live with the gaps, then that is your answer.

This was a feature that existed in ENGINE=MyISAM. See this for a simulation of it using InnoDB.

You mentioned "ported nationally". Are you trying to create unique ids that are locally, not centrally, generated? See UUID and its evils.

If you are summarizing large quantities of data, then build and maintain Summary tables.

If you want to discuss further, please provide prototypes of schema and queries.