MySQL table design of logging tables

database-designindexMySQLpartitioning

I'm not a DBA by any means, but I have a basic understanding of some of the underlying principles of database optomization.

Currently we have a table that logs session data, it writes about 1.2million rows a month. At the beginning of each month we split off last months rows and move them to a table called "tbl_session_mm_yyyy". We want to start querying the live table and the older tables and pull out details like last logged in (simple sql query, select maxdate from tbl_live, where accountid = 123 union select etc from tbl_old etc).

Now i'm thinking that the design of our database structure isn't quite right for this. So where we would have 12 tables for the year, each year, we would have to keep building huge unions to get all those tables. Thus would it be better to do this:

tbl_sessions: – this is the live table that is written to all the time, indexes on things like userid, accountid. this would only ever contain the current months records.

tbl_old_sessions: – this would replace the monthly table and be one giant table that contains all previous months records. Partioned by month (on a datefield populated by now()) and indexed on accountid, userid etc

Does this design make better sense than my previous one of monthly tables? should i be worried about partioning the tbl_old_sessions if i'm doing very little querying by actual date?

Best Answer

Your new design will make it easier to query across all of the tables without the need to know which tables to union when you write a query.

I'm not sure if you need to worry about the partitioning aspect. It sounds OK, but I've never worked with something like this on this scale (not in MySQL).

If you are worried about the partitioning, you could stay with your current solution, but instead of a new table for every year+month, have 12 tables, one for each month, for all years. At least your unions would be constant.

Related Solutions

Sql-server – SQL large table design

"This table has the appropriate indecies but is becoming a major hangup when running queries"

Partitioning alone doesn't help query performance unless SQL Server is able to eliminate partitions when running a query. Your WHERE clause needs to line up with the way you partition. We only get one field to use as a partitioning field, so if that field isn't included in your WHERE clause, you're still likely to scan the entire table despite having partitions.

"and just because of its size."

Partitioning can make certain maintenance operations easier, but there's still things we can't do on a partition-by-partition basis. If index maintenance and stats updates are causing you problems, you're better off splitting the design into an archive table and a live-updated table. When you need to periodically move data from the live table to the archive table, you do that, rebuild the indexes with 100% fill factor, update stats with full scan, and then set its filegroup to read-only. Partitioning can help with archive table loads - but partitioning the live table may not. (I'm tossing out several advanced concepts here as if it's quick and simple, but I'm just sketching out some background here.)

"It appears that using the partitioning would require less code changes."

Sorta kinda - it looks that way at first glance, but the more you get into it, you've got options like partitioned views. You can rename the existing table, put in a view in its place, and then you can make your own changes to the underlying tables (and add multiple tables) without changing your app.

I've written more about the pitfalls of partitioning here:

http://www.brentozar.com/archive/2008/06/sql-server-partitioning-not-the-answer-to-everything/

Mysql – thesql session logging tables

First, from looking at your existing indexes and comparing them to your example query, you are missing an index on just accountid. The way MySQL handles indexes is left-most, meaning you can have a composite index like this:

KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE,

and run a query that looks for contactid and the contactaccounts index would be a potential index*. However doing a query on accountid will not utilize the index, because accountid is not the left-most column.

If you never search for contactid without an accountid, I would create the index like this:

DROP INDEX `contactaccounts` ON `tbl_sessions_2012`;
CREATE INDEX `accountscontact` ON `tbl_sessions_2012` (`accountid`,`contactid`,`journalcode`);

Now, analyzing your other indexes, the first of each set is redundant using the left-most rule, and can be dropped in favor of the second:

KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE 
KEY `contactaccountloginmonths` (`contactid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `contactaccountcollections` (`contactid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `contactaccountcollectionloginmonths` (`contactid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `organisationaccounts` (`organisationid`,`accountid`) USING BTREE
KEY `organisationaccountloginmonths` (`organisationid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `organisationaccountcollection` (`organisationid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `organisationaccountcollectionsloginmonths` (`organisationid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

By having the duplicates you are using up a lot of space on indexes.

_{* I say potential because there is still the possibility that your index will not be used due to the amount of rows MySQL has to scan through. As your account_id IN () statement grows, MySQL will determine that it's faster just to do a full table scan regardless of the index.}

Best Answer

Related Solutions

Sql-server – SQL large table design

Mysql – thesql session logging tables

Related Question