Mysql – Best way to organize compound indexes on partitioned table (date by range)

index-tuningMySQLpartitioning

I have a table with an auto increment primary key, but in order to use the partition feature in MySQL I had to include the date in the primary key so I would use partition by range using the date column.

Since MySQL creates an automatic index on the primary key (I'm assuming this), does this mean that since my date column is not on the left part of the index, then would it be beneficial to create a single index on just the date column?

95% of my queries use the date column under WHERE
5% of my queries use the id under the WHERE clause
none of my queries use both columns under the WHERE clause

I used to have just a single index on the date column, but since I'm using partitions I'm forced to add the date column to the primary key. Should I create a single index on the date column? I'm asking this because my tables are big (30GB) and I don't want to add unnecessary overhead.

Best Answer

Yes, in order to use the date column efficiently you need the date first in the index - or as you suggest, by itself. If you can sacrifice auto increment, you can change the index order to (date,id), but then you'd need another index for id for the 5% queries on them. Probably your best bet is an extra index for date.

Ref: http://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html

"MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on"

Furthermore - you can test the way MySQL uses indexes by prefixing your queries with "explain", but you probably knew that already. This way you can try the variants and evaluate them.

Related Solutions

How to Create a Partitioned Table Based on Date in SQL Server

As the error message says, any partition-aligned unique index has to include the partitioning key in the index key. This requirement exists so the engine can enforce uniqueness on updates without checking every partition.

In your case, this means including OrderDate in the nonclustered index key, or having a non-aligned index. Both are potentially valid choices, depending on your circumstances. To preserve alignment, your table and index definitions would be:

CREATE TABLE dbo.Orders
(
    OrderID     integer NOT NULL,
    Name        nvarchar(20) NULL,
    OrderDate   date NOT NULL,

    CONSTRAINT PK__Orders_OrderID_OrderDate
        PRIMARY KEY NONCLUSTERED 
            (OrderID, OrderDate)
        ON PS (OrderDate)
)
ON PS (OrderDate);
GO
CREATE CLUSTERED INDEX CX__Orders_OrderDate
ON dbo.Orders (OrderDate)
ON PS (OrderDate);

Of course this changes the uniqueness that the nonclustered index enforces. Now, only the combination of OrderID and OrderDate is guaranteed to be unique. It is theoretically possible to add duplicate OrderIDs, so long as the OrderDate is different. Whether this change of semantic is acceptable to you depends on your circumstances, but it is something to be aware of.

The alternative is to have the nonclustered primary key non-aligned:

CREATE TABLE dbo.Orders
(
    OrderID     integer NOT NULL,
    Name        nvarchar(20) NULL,
    OrderDate   date NOT NULL,

    CONSTRAINT PK__Orders_OrderID
        PRIMARY KEY NONCLUSTERED 
            (OrderID)
        ON [PRIMARY]
)
ON PS (OrderDate);
GO
CREATE CLUSTERED INDEX CX__Orders_OrderDate
ON dbo.Orders (OrderDate)
ON PS (OrderDate);

This preserves the uniqueness of OrderID alone, and has some benefits with queries that compute MIN or MAX aggregates using the index, but you lose the ability to SWITCH partitions in and out without dropping the primary key and recreating it after the switching operation.

You can read more about partitioning in this section of Books Online and more on the issues of aligned and non-aligned indexes in this excellent answer by Remus Rusanu.

Mysql – Does partitioning a MySQL table also increase index creation performance

Partitioning usually rather decreases ALTER TABLE speed. Index and data is stored in separate files for each partition (a separate .ibd file for INNODB if innodb_file_per_table=1), so that you might make alter in parallel. But partitioning add an extra processing step to choose the partition for storing data/index, thus the whole operation is made slower.

mysql> create table alter_test (id int primary key auto_increment, v   char(20));
mysql> load data infile '/tmp/1' into table alter_test;
Query OK, 500000 rows affected (9.98 sec)

mysql> alter table alter_test add index i_test(v);
Query OK, 0 rows affected (4.55 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> create table alter_test (id int primary key auto_increment, v  char(20)) partition by key(id) partitions 10;
mysql> load data infile '/tmp/1' into table alter_test;
Query OK, 500000 rows affected (9.98 sec)

mysql> alter table alter_test add index i_test(v);
Query OK, 0 rows affected (4.83 sec)

As for the shared lock for ALTER TABLE, all partitions are locked in the same time, so partitioning does not help.

Hints:

Upgrade to MySQL 5.6, since it supports online alter table.
Use scripts such as oak-online-alter-table or pt-online-schema-change as a workaround for MySQL version < 5.6

Best Answer

Related Solutions

How to Create a Partitioned Table Based on Date in SQL Server

Mysql – Does partitioning a MySQL table also increase index creation performance

Related Question