Mysql – How to build index for MySQL table that inserts a lot and queries only most recent data

indexindex-tuninginnodbMySQL

I have a MySQL database table using InnoDB and looks like this:

OrgID (INTEGER)
MachID (INTEGER)
Date (DATETIME)
IdleTime (INTEGER)

I do two operations on this table:

Insert a new record, with Org ID, Mach ID, Date and IdleTime
Query for either a given Organization/Machine ID pair, for last day, the sum of IdleTime:

SELECT SUM(IdleTime) FROM DBTable WHERE OrgID=1 AND MachID=2 AND Date BETWEEN 'YYYY-mm-dd 00:00:00' AND 'YYYY-mm-dd 23:59:59';

We may want to get the sum of total IdleTime for given OrgID, but it is just an extra bonus. In most cases, YYYY-mm-dd is yesterday.

We concern more on the performance ofINSERT operation but we do not want too slow SELECT. My boss recommended adding a new column as the primary key, namely something like INSERT_TIME(TIMESTAMP) , to ensure insertion is sequential. I am wondering if using a PRIMARY KEY (OrgID, MachID, Date) may help. Any suggestions on how do I use the index to boost the performance?

Best Answer

PRIMARY KEY (OrgID, MachID, Date) would be good for that query. But, a PRIMARY KEY is necessarily "unique"; is that combination unique? If not unique, then make it a plain INDEX and have something else as the PRIMARY KEY.

If we are talking about millions of rows per day, then Summary tables would be worth doing.

If it is usually "yesterday", then consider:

AND Date >= CURDATE() - INTERVAL 1 DAY
AND Date  < CURDATE()

Or, more generically, (where you insert the days ago for '?'):

AND Date >= CURDATE() - INTERVAL ?   DAY
AND Date  < CURDATE() - INTERVAL ?-1 DAY

More

Insuring sequential -- not that important. With that 3-part PK, you will be inserting sequentially in several spots, namely each combination of OrgID and MachID.

Insert do need to update secondary keys, but this is usually not a big deal. (To get really technical, see "InnoDB's change buffering".)

If you are going to be inserting more than 100 rows per second, then I recommend gathering them up and batch inserting. Also, see innodb_flush_log_at_trx_commit = 2.

Related Solutions

Mysql – Primary key index with a DATETIME as first part of the compound key is never used

This is a bug in 5.5.x. See here

That suggests that your query should be

SELECT * 
FROM `stats`  
WHERE 
   dateDim = CAST('2014-04-03 00:00:00' as datetime)
   AND accountDim = 4
   AND execCodeDim = 9
   AND operationTypeDim = 1
   AND junkDim = 5
   AND ipCountryDim = 3

Mysql – GUID as reason for MySQL import slowness

The old cliche SIZE MATTERS applies here without a doubt.

Smaller keys would load into a table faster that with biggers (This becomes even more evident with InnoDB). The bigger the columns in the index, the fewer the keys would be residing in BTREE pages. Thus, you get taller BTREE trees. Even a BTREE height of 3 or more, which is bad for a small table, would be the result of having large columns. The same would apply to the name (up to 150 characters) and category (up to 100 characters) fields. Making all columns CHAR instead of VARCHAR would increase SELECT performance, but the significant increase of diskspace + BTREE node management would throw SELECT performance under the bus as the table grows.

Looking at all of your keys I can see this table being extermely lopsided in index size verses table size.

Please run this query

SELECT data_lenth,index_length
FROM information_schema.tables
WHERE table_name = 'acl_action';

You will see the combined sum of all index pages being as large or larger than the table. From a physical standpoint, the file acl_actions.MYI will be bigger that acl_actions.MYD.

Here is where random I/O would come into play: Notice your index idx_aclaction_id_del

Notice that you have it defined as

KEY `idx_aclaction_id_del` (`id`,`deleted`),

If you ever decide to query for deleted records, you will get an full index scan because of the order of trhe columns in this index.

KEY `idx_aclaction_del_id` (`deleted`,`id`),

That way all deleted keys are grouped together. Likewise, all non-deleted rows are grouped together.

Try altering the table design like this

CREATE TABLE `acl_actions` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `guid` char(36) NOT NULL,
  `date_entered` datetime NOT NULL,
  `name` varchar(150) default NULL,
  ........
  `category` varchar(100) default NULL,
  `deleted` tinyint(1) default '0',
  PRIMARY KEY  (`id`),
  UNQIUE KEY  (`guid`),
  KEY `idx_aclaction_id_del` (`deleted`,`id`),
  KEY `idx_category_name` (`category`,`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

You should run the following query

SELECT name,categrory FROM acl_actions PROCEDURE ANALYSE();

This will recommend the correct sizes for those columns with the given dataset.

Best Answer

Related Solutions

Mysql – Primary key index with a DATETIME as first part of the compound key is never used

Mysql – GUID as reason for MySQL import slowness

Related Question