Mysql – Should I use UUID as well as ID

database-designMySQLperformanceuniqueidentifier

I've been using UUIDs in my systems for a while now for a variety of reasons ranging from logging to delayed correlation. The formats I used changed as I became less naive from:

  1. VARCHAR(255)
  2. VARCHAR(36)
  3. CHAR(36)
  4. BINARY(16)

It was when I reached the final one BINARY(16) that I started to compare performance with basic auto-increment integer. The test and results are shown below, but if you just want the summary, it indicates that INT AUTOINCREMENT and BINARY(16) RANDOM have identical performance on data ranges up to 200,000 (the database was pre-populated prior to tests).

I was initially sceptical towards using UUIDs as primary keys, and indeed I still am, however I see potential here to create a flexible database that can use both. Whereas many people stress over the advantages of either, what are the disadvantages cancelled out by using both data types?

  • PRIMARY INT
  • UNIQUE BINARY(16)

The use case for this type of set up would be the traditional primary key for inter-table relationships, with unique identifier used for inter-system relationships.

What I am essentially trying to discover is difference in efficiency between the two approaches. Besides the quadruple disk space used, which may be largely negligible after additional data is added, they appear to me to be the same.

Schema:

-- phpMyAdmin SQL Dump
-- version 4.0.10deb1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Sep 22, 2015 at 10:54 AM
-- Server version: 5.5.44-0ubuntu0.14.04.1
-- PHP Version: 5.5.29-1+deb.sury.org~trusty+3

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;

--
-- Database: `test`
--

-- --------------------------------------------------------

--
-- Table structure for table `with_2id`
--

CREATE TABLE `with_2id` (
  `guidl` bigint(20) NOT NULL,
  `guidr` bigint(20) NOT NULL,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY (`guidl`,`guidr`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- --------------------------------------------------------

--
-- Table structure for table `with_guid`
--

CREATE TABLE `with_guid` (
  `guid` binary(16) NOT NULL,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY (`guid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- --------------------------------------------------------

--
-- Table structure for table `with_id`
--

CREATE TABLE `with_id` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=197687 ;

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

Insert benchmark:

function benchmark_insert(PDO $pdo, $runs)
{
    $data = 'Sample Data';

    $insert1 = $pdo->prepare("INSERT INTO with_id (data) VALUES (:data)");
    $insert1->bindParam(':data', $data);

    $insert2 = $pdo->prepare("INSERT INTO with_guid (guid, data) VALUES (:guid, :data)");
    $insert2->bindParam(':guid', $guid);
    $insert2->bindParam(':data', $data);

    $insert3 = $pdo->prepare("INSERT INTO with_2id (guidl, guidr, data) VALUES (:guidl, :guidr, :data)");
    $insert3->bindParam(':guidl', $guidl);
    $insert3->bindParam(':guidr', $guidr);
    $insert3->bindParam(':data',  $data);

    $benchmark = array();

    $time = time();
    for ($i = 0; $i < $runs; $i++) {
        $insert1->execute();
    }
    $benchmark[1] = 'INC ID:     ' . (time() - $time);

    $time = time();
    for ($i = 0; $i < $runs; $i++) {
        $guid  = openssl_random_pseudo_bytes(16);

        $insert2->execute();
    }
    $benchmark[2] = 'GUID:       ' . (time() - $time);

    $time = time();
    for ($i = 0; $i < $runs; $i++) {
        $guid  = openssl_random_pseudo_bytes(16);
        $guidl = unpack('q', substr($guid, 0, 8))[1];
        $guidr = unpack('q', substr($guid, 8, 8))[1];

        $insert3->execute();
    }
    $benchmark[3] = 'SPLIT GUID: ' . (time() - $time);

    echo 'INSERTION' . PHP_EOL;
    echo '=============================' . PHP_EOL;
    echo $benchmark[1] . PHP_EOL;
    echo $benchmark[2] . PHP_EOL;
    echo $benchmark[3] . PHP_EOL . PHP_EOL;
}

Select benchmark:

function benchmark_select(PDO $pdo, $runs) {
    $select1 = $pdo->prepare("SELECT * FROM with_id WHERE id = :id");
    $select1->bindParam(':id', $id);

    $select2 = $pdo->prepare("SELECT * FROM with_guid WHERE guid = :guid");
    $select2->bindParam(':guid', $guid);

    $select3 = $pdo->prepare("SELECT * FROM with_2id WHERE guidl = :guidl AND guidr = :guidr");
    $select3->bindParam(':guidl', $guidl);
    $select3->bindParam(':guidr', $guidr);

    $keys = array();

    for ($i = 0; $i < $runs; $i++) {
        $kguid  = openssl_random_pseudo_bytes(16);
        $kguidl = unpack('q', substr($kguid, 0, 8))[1];
        $kguidr = unpack('q', substr($kguid, 8, 8))[1];
        $kid = mt_rand(0, $runs);

        $keys[] = array(
            'guid'  => $kguid,
            'guidl' => $kguidl,
            'guidr' => $kguidr,
            'id'    => $kid
        );
    }

    $benchmark = array();

    $time = time();
    foreach ($keys as $key) {
        $id = $key['id'];
        $select1->execute();
        $row = $select1->fetch(PDO::FETCH_ASSOC);
    }
    $benchmark[1] = 'INC ID:     ' . (time() - $time);


    $time = time();
    foreach ($keys as $key) {
        $guid = $key['guid'];
        $select2->execute();
        $row = $select2->fetch(PDO::FETCH_ASSOC);
    }
    $benchmark[2] = 'GUID:       ' . (time() - $time);

    $time = time();
    foreach ($keys as $key) {
        $guidl = $key['guidl'];
        $guidr = $key['guidr'];
        $select3->execute();
        $row = $select3->fetch(PDO::FETCH_ASSOC);
    }
    $benchmark[3] = 'SPLIT GUID: ' . (time() - $time);

    echo 'SELECTION' . PHP_EOL;
    echo '=============================' . PHP_EOL;
    echo $benchmark[1] . PHP_EOL;
    echo $benchmark[2] . PHP_EOL;
    echo $benchmark[3] . PHP_EOL . PHP_EOL;
}

Tests:

$pdo = new PDO('mysql:host=localhost;dbname=test', 'root', '');

benchmark_insert($pdo, 1000);
benchmark_select($pdo, 100000);

Results:

INSERTION
=============================
INC ID:     3
GUID:       2
SPLIT GUID: 3

SELECTION
=============================
INC ID:     5
GUID:       5
SPLIT GUID: 6

Best Answer

UUIDs are a performance disaster for very large tables. (200K rows is not "very large".)

Your #3 is really bad when the CHARCTER SET is utf8 -- CHAR(36) occupies 108 bytes! Update: There are ROW_FORMATs for which this will stay 36.

UUIDs (GUIDs) are very "random". Using them as either a UNIQUE or a PRIMARY key on large tables is very inefficient. This is because of having to jump around the table/index each time you INSERT a new UUID or SELECT by UUID. When the table/index is too large to fit in cache (see innodb_buffer_pool_size, which must be smaller than RAM, typically 70%), the 'next' UUID may not be cached, hence a slow disk hit. When the table/index is 20 times as big as the cache, only 1/20th (5%) of hits are cached -- you are I/O-bound. Generalization: The inefficiency applies to any "random" access -- UUID / MD5 / RAND() / etc

So, don't use UUIDs unless either

  • you have "small" tables, or
  • you really need them because of generating unique ids from different places (and have not figured out another way to do it).

More on UUIDs: http://mysql.rjweb.org/doc.php/uuid (It includes functions for converting between standard 36-char UUIDs and BINARY(16).) Update: MySQL 8.0 has a builtin function for such.

Having both a UNIQUE AUTO_INCREMENT and a UNIQUE UUID in the same table is a waste.

  • When an INSERT occurs, all unique/primary keys must be checked for duplicates.
  • Either unique key is sufficient for InnoDB's requirement of having a PRIMARY KEY.
  • BINARY(16) (16 bytes) is somewhat bulky (an argument against making it the PK), but not that bad.
  • The bulkiness matters when you have secondary keys. InnoDB silently tacks the PK onto the end of each secondary key. The main lesson here is to minimize the number of secondary keys, especially for very large tables. Elaboration: For one secondary key, the bulkiness debate usually ends in a draw. For 2 or more secondary keys, a fatter PK usually leads to a bigger disk footprint for the table including its indexes.

For comparision: INT UNSIGNED is 4 bytes with range of 0..4 billion. BIGINT is 8 bytes.

Italics Updates/etc were added Sep, 2017; nothing critical changed.

Addressing Ivan's answer and the link he provided.

  • Yes, there are benefits of a distributed id-generator. But...
  • There are other ways to avoid the delay of a centralized ID server -- Dispense 100 consecutive ids at a time to the client. Or use a client-generated concatenation of client_id and time.
  • If the entire index is cached in RAM, then much of my argument is moot. But once it spills to disk, performance suffers severely. His article does briefly point out that UUIDv4 is slower than UUIDv1, but fails to point out that the bits need rearranging to achieve the benefit.
  • I agree with squeezing a 36-byte UUID into a 16-byte BINARY. MySQL 8 even has a function to do such. Furthermore, it (and my blog) rearrange the bits so that UUIDv1 has the temporal characteristic of an auto_increment.
  • Sharding, I contend, is not normally done against auto_increment ids, so switching to UUIDs does not buy anything.