Mysql – auto-incrementing key in self-referencing table, performance

auto-incrementdatabase-designMySQL

My design contains a table where almost every entry has a parent it has to point to, within the same table (differently put, self references).

    /* scenario executions */
    CREATE TABLE `boteval`.`scenario_executions` (
      `scenario_id` BIGINT NOT NULL COMMENT 'connects to a scenario id',
      `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT 'execution id',
      `parent_id` BIGINT NULL COMMENT 'execution id of parent executor (null would mean no parent, i.e. a top level execution)',
      `started` DATETIME(6) NOT NULL,
      `ended` DATETIME(6) NULL COMMENT 'can be null while the execution has not yet ended',
      PRIMARY KEY (`id`),
      UNIQUE INDEX `id_UNIQUE` (`id` DESC));

The use case being one where tree structures need to be incrementally recorded into the table, first the parent, than children, children of children, etc, as the program unfolds.

Using an auto-incrementing key for this table, I have reason to believe it would be extruciatingly slow if every insert needed to happen only after its predecessor had already returned its database-assigned auto incremented key.

On the other hand, I think that batching this workload would be way beyond naive, as a sql that strings together the returned ids would become extremely nested for each buffered batch, not to mention the dependencies between batches.

I have several solutions to this, but I wonder if there are very specific design patterns you would recommend. Maybe one of my design constraints would need to be relaxed: maybe the auto-incrementing, if I can safely supply a unique key from my code. In a way, this might not even really be a good fit for a relational database like MySQL.

Thanks in advance for any comments on good patterns for this kind of scenario!

Best Answer

In indexes, DESC is ignored. (This will be rectified in version 8.0.)
MySQL is happy to use an ASC index (that is, any index) but scan it in descending order.
So DROP that redundant UNIQUE index.

That said, I don't see that anything you have described so far could be cause "excruciatingly slow" inserts.

To further elaborate. You have 2 BTrees:

One with all the columns, and ordered (ASC) by ID
One with just ID and ordered (ASC) by it.

When inserting a new row (without explicitly specifying ID), the server is optimized to insert it into the "last" block of the BTree and bumping the A_I value.

Meanwhile, the UNIQUE index is acting much the same. But it is doubling the amount of work to do, with no benefit.

On the other hand, if you have multiple connections doing the inserts, this may be worth noting -- How the inset works:

Get a new A_I value (well optimized)
Check all UNIQUE indexes for duplicates. This may be slow, since another connection may be locking the adjacent ID.
Insert the row
Throw info into the "Change buffer" for later updating of non-unique indexes. (This does not apply.)

Also worth noting in InnoDB:

Each transaction has some overhead -- writing to the undo log, writing the "double write" buffer, etc.
Hence, batched insert run much faster because they spread out this overhead over multiple rows. Can you insert, say, 100 rows in a single INSERT? That may literally run 10 times as fast.

Do you really need BIGINT? It takes 8 bytes. INT UNSIGNED takes only 4 bytes, and allows 4 billion values.

Related Solutions

Mysql – use a foreign key index as a shortcut to getting a row count in an INNODB table

You cannot trust the EXPLAIN plan's row count. Why?

Each time Query Optimization takes place where it involves an InnoDB table, the InnoDB Storage Engine does an approximation of the BTREE entries for the keys. They approximations change from query to query.

Simply run SHOW INDEXES FROM guage_data; multiple times and you will get a different set of row counts each time. You could set innodb_stats_on_metadata to 0:

SET GLOBAL innodb_stats_on_metadata = 0;

This will cause SHOW INDEXES FROM guage_data; to give the same number over and over again. This is still self-defeating because the row count would still be approximated anyway. You would have to disable the metadata stats and run ANALYZE TABLE. Again, this is self-defeating.

With innodb_stats_on_metadata on, approximations are always done. I wrote about this back on June 21, 2011 : From where does the MySQL Query Optimizer read index statistics?

In light of all this, you are better off doing SELECT id FROM guage_data if id is the PRIMARY KEY. However, notice in your EXPLAIN plan that MySQL choose to traverse the gauge_data_FI_1 index. This makes all the sense in the world if this table is InnoDB. Why? Every non-unique index stores the rowid of the PRIMARY KEY anyway.

Mysql – #1093 – You can’t specify target table ‘R’ for update in FROM clause

Based on the previous question and this, the following should do what you want. As with Pieter's update it doesn't assume that the ID's are in sequential order, rather it uses the ReqTime to determine order.

UPDATE Visits R
  INNER JOIN (
    SELECT ID,
      TIME_TO_SEC(TIMEDIFF(
        (SELECT B.ReqTime
         FROM Visits B
         WHERE B.ReqTime > A.ReqTime
         ORDER BY B.ReqTime
         LIMIT 1
         )
        ,A.ReqTime)) Duration
    FROM Visits A
   ) N ON R.ID = N.ID
SET R.Duration = N.Duration
WHERE Duration IS NULL;

Best Answer

Related Solutions

Mysql – use a foreign key index as a shortcut to getting a row count in an INNODB table

Mysql – #1093 – You can’t specify target table ‘R’ for update in FROM clause

Related Question