Mysql – Large accounting table, best way to partitioning by two dates

MySQLpartitioningperformance

I need to create a log table to stores the connections to out network (last 3 years, after that the log will go into backup). The hardware/software used is proprietary, and for accounting it just call our custom script with some arguments like this:

when a user connects (our_script START user mac ip);
when a user disconnects (our_script STOP user mac ip in_bytes out_bytes more)

Sometimes we do not receive the disconnect message. So we need to adapt to this.

So far I came up with this structure for the accounting table:

CREATE TABLE `accounting` (
  `user` varchar(50) NOT NULL DEFAULT '',
  `mac` varchar(20) NOT NULL DEFAULT '',
  `ip` varchar(15) NOT NULL DEFAULT '',
  `ipv6` varchar(39) DEFAULT NULL,
  `start_datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `stop_datetime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `in_bytes` bigint(32) unsigned DEFAULT '0',
  `out_bytes` bigint(32) unsigned DEFAULT '0',
  `more_columns` varchar(255) default NULL, 
  PRIMARY KEY (`user`,`mac`,`ip`,`start_datetime`,`stop_datetime`),
  KEY `prim_ipv6` (`user`,`mac`,`ipv6`,`start_datetime`,`stop_datetime`),
  KEY `user` (`user`) USING HASH,
  KEY `mac` (`mac`) USING HASH,
  KEY `ip` (`ip`) USING HASH,
  KEY `ipv6` (`ipv6`) USING HASH,
  KEY `start_datetime` (`start_datetime`),
  KEY `stop_datetime` (`stop_datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

The query used for writing are (these needs to be real fast):

when a user connects a simple insert
when a user disconnects update accounting set stop_datetime=now(), in_bytes=$in_bytes, out_bytes=$out_bytes, more_columns=$more where user="$user" and ip="$ip" and stop_datetime="0000-00-00 00:00:00" order by start_datetime DESC limit 1;, this is a workaround in the case we have more then one row for start, we just update the last start.

A better workaround, is to update the stop_datetime for rows without it, at every start, and use another column for storing that this is not a normal stop, I think I will go with the second workaround.

We have 2 ways to select from this table (this could be slower max 1 – 2 sec):

select the last 50 connection for a user: select ... where user="$user" order by start_datetime DESC limit 50;, for this the user HASH key is handy, beacause AFAIK HASH is better than BTREE for equality.
select witch user was connected with ip=$ip on date=$date select ... where ip="$ip" and $date between start_datetime and stop_datetime;

This table will hold between 0.5 to 1 bilion rows, that's why I thought to partition it.
The best option is to partition it by month, but I have 2 relevant dates, and at the begining/end of the month start will be in one partition, stop will be in another.

The questions:

Witch is the best way to do partitions, by start_datetime, by stop_datetime, by start_datetime with subpartitions for stop_datetime ?
Do I realy need to partitionate ?
Do you have any other suggestion on how to improve this ?

Thank you

Best Answer

Since stop_datetime will not be known at insert time the partition must be done by start_datetime.

I think partitioning by day (date_format(start_datetime, '%Y-%m-%d')) would be adequate since 1 billion rows divided by (3 years * 365 days) ≃ 900,000 rows per partition

Updating indexes is expensive and it happens at inserts, updates and deletes. Keep the indexes to a minimum. If the main queries will be those selects then I would keep the primary key like the following and trash all the other keys:

PRIMARY KEY (`user`,`start_datetime`,`stop_datetime`,`mac`,`ip`)

Related Solutions

Mysql – find and insert row to another table using thesql trigger

This should do the trick for you:

DELIMITER $$
DROP TRIGGER IF EXISTS `employee_INSERT` $$
CREATE TRIGGER `employee_INSERT` 
AFTER INSERT ON `employee`
FOR EACH ROW
BEGIN
    INSERT INTO employee_tools (Id, Tool)
    SELECT new.Id, tools.Tool_Name 
            FROM tools
            WHERE tools.Division = new.Division;
END $$
DELIMITER ;

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Best Answer

Related Solutions

Mysql – find and insert row to another table using thesql trigger

MySQL looking up more rows than needed (indexing issue)

Related Question