MySQL Optimization – How to Optimize Storage for a Table

MySQLoptimization

I'm using MySql to store a basic table of this format:

    id      int(11)          //Auto-Incrementing ID
    data    varchar(5120)    //Random input data, compressed by a program, not mysql.
    -----------------------
    Row size ø  916 B

That's kind of troubling, because right now I have about 5,000 records each month, and I'd like to optimize that better than ending up averaging out at 1mb/1000 records.

I set it up this way originally because I wanted to be able to capture the larger sets, but they're very infrequent, as you can see here by this graph:

enter image description here

Link to graph API

count   n*128 bytes
+-------+----------
1       28
1       26
1       24
2       22
8       21
4       20
13      19
12      18
16      17
27      16
43      15
58      14
69      13
114     12
184     11
262     10
399     9
588     8
807     7
1224    6
1245    5
546     4
73      3
9       2
6       1
1       0

My main concern here is how much space I've wasted just to have accommodated the tail end of that chart, and what happens when I end up with even larger outliers? I'll have to bump up my row size again, when roughly 80% of the data fits very nicely into a varchar 1024 block, 1/5th the size of what I'm using now.

So how should I have built this thing in the first place?

Best Answer

Since it is built already, let's look at what you have. There is an interesting way to plan columns definitions for data currently present. If the table name is mydata, try running this query:

SELECT data FROM mydata PROCEDURE ANALYSE();

This will not display any of your data. This will examine the column data itself, calculate statistics based on the first 256 distinct values (by default, you can specify the distinct value count) and recommend the proper column type the table should have.

As long as the table is MyISAM, don't worry too much about the rows size because the default row format is DYNAMIC. If the table is InnoDB (and I sure hope it is not), please make sure the data is not part of the PRIMARY KEY. Your clustered index will grow like pouring undiluted MiracleGro in your backyard.

You may need to break up the data column into chunks of 128 and run either MD5 (32 character output) or SHA1 (40 character output) and concatenate those MD5 or SHA1 outputs and store them. That would save at 75% of storage. Have fun trying to code that breakup. This was just a rambling suggestion off the top of my head.

You may want to consider using Sphinx Indexing as well.

Related Solutions

Mysql – What storage engine should I use for this MySQL table

When it comes to InnoDB, you should never use big primary keys. Why?

InnoDB has an internal Clustered Index called gen_clust_index. When you are doing INSERTs into an InnoDB table with just a Primary Key, you are building up the gen_clust_index. I would expect this table to bloat very quickly. Your goal should be to record the URL and index it in such a way that the primary is not large and causing potentially make for slow INSERTs. Given the size of the field, I wouldn't even create a secondary index

This table could easily be a MyISAM table. However, in a high-traffic environment, each INSERT into a MyISAM table would produce a full table lock. That's an automatic bottleneck just to record a URL. So, MyISAM is out. Making the table InnoDB allows a high-traffic environment the freedom to INSERT without locking.

I would like to suggest the following approach:

CREATE TABLE url_trail
(
    id int not null auto_increment,
    url_md5 VARCHAR(32),
    url VARCHAR(255),
    PRIMARY KEY (id),
    KEY url_md5 (url_md5),
    KEY url (url (50))
) ENGINE=InnoDB;

Using this type of layout will allow you to index the MD5 o the URL (32 btyes) rather than the 200+ character URL. There is also an optional index on the first 50 characters of the URL. You could use either index or both.

You will have to do INSERTs like this:

INSERT INTO url_trail (url,url_md5) VALUES ('www.yahoo.com',MD5('www.yahoo.com'));

You will have to do searches like this:

SELECT * FROM url_trail WHERE url_md5 = MD5('www.yahoo.com');
SELECT * FROM url_trail WHERE url = 'www.yahoo.com';

The essential thing overall is to not to have bloated primary keys.

Mysql – How to optimize this query

Take advantage of the index for each table doing something like this:

(SELECT  MIN(`time`) `time` FROM  `eventlog` WHERE userid =  '9819') UNION ALL
(SELECT  MIN(`time`) FROM  `screenshots` WHERE userid =  '9819')     UNION ALL
(SELECT  MIN(`time`) FROM  `sitelog` WHERE userid =  '9819')         UNION ALL
(SELECT  MIN(`time`) FROM  `keylog` WHERE userid =  '9819')          UNION ALL
(SELECT  MIN(`time`) FROM  `webcamcaps` WHERE userid =  '9819')
ORDER BY  `time` ASC 
LIMIT 1;

(SELECT  `time` FROM  `eventlog` WHERE userid =  '9819' ORDER BY time ASC LIMIT 1)
UNION ALL
(SELECT  `time` FROM  `screenshots` WHERE userid =  '9819' ORDER BY time ASC LIMIT 1)
UNION ALL
(SELECT  `time` FROM  `sitelog` WHERE userid =  '9819' ORDER BY time ASC LIMIT 1)
UNION ALL
(SELECT  `time` FROM  `keylog` WHERE userid =  '9819' ORDER BY time ASC LIMIT 1)
UNION ALL
(SELECT  `time` FROM  `webcamcaps` WHERE userid =  '9819' ORDER BY time ASC LIMIT 1)
ORDER BY  `time  ASC 
LIMIT 1;

The main idea is to get each subquery to return 1 row.

Best Answer

Related Solutions

Mysql – What storage engine should I use for this MySQL table

Mysql – How to optimize this query

Give it a Try !!!

Related Question