Mariadb clustering Index design question

auto-incrementclustered-indexmariadboptimization

Here are the columns of concern for a table describing tasks:

Org_ID
Task_ID
Task_property_columns

The only requirement for Task_ID is that it is unique per every Org_ID, however I would prefer not handle that programatically and that it be auto incremented.

The requirement for the Org_ID is that I would like to run metrics and analytics using sort operations with the table organised by Org_ID and ported nationally some of the task properties. The table being clustered on Task_ID would be useless, but would be very beneficial to generate on Org_ID.

From reviewing the documentation, it seems that clustering is picked for you in MariaDB where it will use the primary key, a unique non-nullable index, or a background auto generated index in that order.

I am pretty new to MariaDB and was hoping for some help in picking the right solution. From what I understand I can't find a good way to auto increment and cluster how I want it to as well.

Best Answer

First, to clarify: "cluster" in this Question refers to the "clustering" of the PRIMARY KEY with the data, not the "clustering of servers".

You have provided the 3 choices that InnoDB makes to picks the clustered PK. Let me rephrase it from a user point of view.

If you have a column, or combination of columns, (all non-NULL) that are inherently UNIQUE, then use it as the PK. This is a "natural" PK.
If choice 1 is not available, use an AUTO_INCREMENT.
Though available, do not use the third choice of an auto-generated hidden PK. Some day, you will need to uniquely identify rows, perhaps for maintenance.

Unlike some other vendors, there is no ROWNUM, nor is there a non-clustered PK. The lack of those is not important.

Secondary keys let you get at the data in other ways. A secondary key contains a copy of the PK so that it can get to the data. Both PK and secondary keys are organized as BTrees. BTree is the best all-around indexing method; the lack of Hash, etc, is not important.

A natural PK lets you do a range scan over the PK column very efficiently.

If find that two-thirds of tables have a "natural" PK; only one-third need an auto_increment instead.

Back to your 'task'. It sounds like you want

PRIMARY KEY(org_id, task_id)

but would like task_id to be automatically generated. This is possible by saying

task_id, INT UNSIGNED AUTO_INCREMENT NOT NULL

but there will be gaps -- since the auto_inc is table-wide, not specific to a given org_id. If you can live with the gaps, then that is your answer.

This was a feature that existed in ENGINE=MyISAM. See this for a simulation of it using InnoDB.

You mentioned "ported nationally". Are you trying to create unique ids that are locally, not centrally, generated? See UUID and its evils.

If you are summarizing large quantities of data, then build and maintain Summary tables.

If you want to discuss further, please provide prototypes of schema and queries.

Related Solutions

Postgresql – Creating sequence for longer numeric values

If you can live with losing some values to the maximum value, you could combine a sequence with a fixed offset to get the 20 digits. I would also define a check constraint on the table to to make sure that accidental inserts without using the default value insert the wrong value:

create sequence my_sequence_name;
create table foo 
(
   id numeric(20,0) default 10000000000000000001 + nextval('my_sequence_name'),
   constraint check_range check (id between 10000000000000000001 and 99999999999999999999)
);

That will give you a maximum generated value of 19223372036854775808 which doesn't use the complete range of allowed values, but it might be enough for your needs.

Mysql – Auto increment value needs to be changed inside trigger

I don't think you can get the new value of auto_id in a BEFORE insert trigger. This will do what you want (if I have understood correctly):

CREATE TRIGGER innodb_seqno_trigger 
   BEFORE INSERT ON my_table 
   FOR EACH ROW 
BEGIN

  SELECT MAX(seqno) INTO @newseqno 
  FROM my_table 
  WHERE id = NEW.id;

  SET NEW.seqno = COALESCE(@newseqno + 1, 1); 

END;

Test at SQL-Fiddle

But have you thought what will happen if you have 2 separate connections/threads/transactions that try to insert the same id?

Best Answer

Related Solutions

Postgresql – Creating sequence for longer numeric values

Mysql – Auto increment value needs to be changed inside trigger

Related Question