MySQL – Arrange a table so that it shrinks as much as possible

MySQL

I'm a newbie and I have +10 million articles(200 to 1000 words) in a InnoDB table. I use this to select them:

SELECT article,title,other_fields from table where id=123;

There is no full-text search or any updates on article field.

How can I arrange the table so that the table size shrinks as much as possible. I'm happy with a little performance tradeoff. I'm sorry for the vague question but I couldn't find any answers online.

PS: There are inserts and updates on table, should I put article and id to a seperate table?

Best Answer

Here's general recommendations for keeping table size small

Try ROW_FORMAT=COMPRESSED when creating InnoDB tables. There will be impact on writes.
Use index with prefix
Use OPTIMIZE TABLE often if it gets many DELETE/UPDATE's

Normalizing tables would actually result in bigger overall size, but less. It MAY help with caching, though.

Related Solutions

Mysql – Is normalization advisable on transactional data

You can definitely keep all your dimensions and measures in one fact table and not use any dimension tables. Make sure your OLAP tool supports this though.

Normalizing out your dimensions into other tables is done mostly to minimize the size of the fact table, which can get large fast.

With no dimension tables you're looking at about 336 MB per year (not counting indexes), which isn't so bad.

With dimension tables, you're looking at about 34 MB per year, plus a couple dozen MB for storing dimension details. Indexes will be smaller too.

You'll want to expand your date column into something more analyzable (year, month, quarter, etc), which will add to the size.

You'll want to index all fields. Drop indexes before insert, add them after.

You can use a tool like Pentaho Aggregation Designer to find useful aggregates and generate them for you.

Mysql – query records to fetch by time stamp

Since the seconds are from 0 (1970-01-01 00:00:00 UTC), you should look for every multiple of 60

SELECT * FROM mytable WHERE MOD(TimeStamp,60)=0;

or if TimeStamp is indexed, you can do

SELECT T.* FROM
(SELECT TimeStamp FROM mytable WHERE MOD(TimeStamp,60)=0) M
INNER JOIN mytable T USING (TimeStamp);

Give it a Try !!!

SUGGESTION #1

You should store the timestamp of the minute and index it

ALTER TABLE mytable ADD COLUMN MinuteTimeStamp AFTER TimeStamp;
UPDATE mytable SET MinuteTimeStamp = TimeStamp - MOD(TimeStamp,60);
ALTER TABLE mytable ADD INDEX MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey);

Then, you can do MIN aggregation on MinuteTimeStamp.

SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp;

and use it get those records

SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);

It was tactfully pointed out that triggers would degrade performance

Perhaps doing INSERTs like this may help

INSERT INTO mytable (UniqueKey,TimeStamp,MinuteTimeStamp) VALUES
(
    uniquevalue,
    UNIX_TIMESTAMP(NOW()),
    UNIX_TIMESTAMP(NOW() - INTERVAL SECOND(NOW()) SECOND)
);

SUGGESTION #2

Since you have over 1000 columns (Ugh), perhaps a table of those minute timestamps would be better.

CREATE TABLE MinuteKeys
(
    MinuteTimeStamp INT UNSIGNED NOT NULL,
    UniqueKey INT UNSIGNED NOT NULL,
    PRIMARY KEY (UniqueKey)
    KEY MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey)
) ENGINE=MyISAM;
ALTER TABLE MinuteKeys DISABLE KEYS;
INSERT INTO MinuteKeys SELECT TimeStamp - MOD(TimeStamp,60),UniqueKey FROM mytable;
ALTER TABLE MinuteKeys ENABLE KEYS;

Then, you could use that table for the aggregation

SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM MinuteKeys GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);

EPILOGUE

Other suggestions are possible but you should really consider normalization of the table

See my post Too many columns in MySQL as to why