Mysql – Is there benefit from having 2 tables for 1 entity

MySQLmysql-5.7performance

I'm using mysql 5.7
I'm making app that has 2 activities: list of entities and viewing single entity (full).

The full entity has 7 related entities and 100 columns.

On the list activity I need only 5 columns and 2 related entities.

I'm thinking if I make separate table for the listing (5 columns + 2 related entities) denormalized. And then when searching through normalized columns get only ids. And get all deneromalized stuf with single where id in 1,5,7 manner.

Would it singificantly increase performance or should i don't bother?

I'm making an android app and want the best speed i can get but dunno if it really matters

Best Answer

Generally it is best to hold data in a normalised structure.

If testing at scale, on production-grade hardware, proves there is a problem, and that the problem is because of an over-large table, and index tuning and SQL re-factoring do not bring the problem withing acceptable bounds, then splitting the table is a recognised technique.

Moving some rows to a different table is called horizontal partitioning or sharding. Moving some columns to a different table is call vertical partitioning. Both entail additional design and maintenance work. There will be run-time implications if values from several partitions are required by a single query.

There is some further discussion at this answer.

Related Solutions

Mysql – Is normalization advisable on transactional data

You can definitely keep all your dimensions and measures in one fact table and not use any dimension tables. Make sure your OLAP tool supports this though.

Normalizing out your dimensions into other tables is done mostly to minimize the size of the fact table, which can get large fast.

With no dimension tables you're looking at about 336 MB per year (not counting indexes), which isn't so bad.

With dimension tables, you're looking at about 34 MB per year, plus a couple dozen MB for storing dimension details. Indexes will be smaller too.

You'll want to expand your date column into something more analyzable (year, month, quarter, etc), which will add to the size.

You'll want to index all fields. Drop indexes before insert, add them after.

You can use a tool like Pentaho Aggregation Designer to find useful aggregates and generate them for you.

Mysql – Retrieving most recent X rows for each given user from a table

I hope you are willing to do some programming outside of SQL.

To find the recipients that need the most pruning:

SELECT recipient
    FROM inbox
    GROUP BY recipient
    HAVING COUNT(*) > 30
    ORDER BY COUNT(*) DESC
    LIMIT 100

Then, for each recipient, first find the 31st id:

SELECT id
    FROM inbox
    WHERE recipient = $recipient
    ORDER BY id DESC
    LIMIT 30,1

And delete the excess baggage:

DELETE FROM inbox
    WHERE recipient = $recipient
      AND id < $id

You would need INDEX(recipient, id).

This process could be running continually. The first SQL would be the slowest, but it would run "Using index" and probably not be too bad, maybe 0.1 sec for 200K rows in inbox.

If you want to stay in SQL, this would let you do one recipient at a time:

BEGIN;
SELECT @recipient := recipient
    FROM inbox
    GROUP BY recipient
    HAVING COUNT(*) > 30
    ORDER BY COUNT(*) DESC
    LIMIT 1
    FOR UPDATE;
SELECT @id := id
    FROM inbox
    WHERE recipient = @recipient
    ORDER BY id DESC
    LIMIT 30,1
    FOR UPDATE;
DELETE FROM inbox
    WHERE recipient = @recipient
      AND id < @id;
COMMIT;

Then, put that in a Stored Procedure with a loop around it. And, if you want to be further 'nice', add a SELECT SLEEP(1); in the loop but outside the transaction.

(Trying to do the entire thing in a single statement makes my brain hurt.)

Best Answer

Related Solutions

Mysql – Is normalization advisable on transactional data

Mysql – Retrieving most recent X rows for each given user from a table

Related Question