MySQL – Split Table vs. Same Values in Multiple Rows

database-designMySQL

I have a table products:

#products
ID | category | type | criteria1 | criteria2

, with category and type being foreign keys of other tables.

Should I split this table up into category1_type1_products, category1_type2_products and so on? It seems to me I should, because right now in my table, there are some thousands of entries with the same ID and category value. Loads of redundand information.

Also, a query could perform faster if mysql didn't have to search for all rows with the specific category and type first. (right?)

Is this recommended structuring behaviour at all? If not, what if my table has got 5 million tuples?

Best Answer

No. That would mean the table NAME contains DATA - you then have to modify the database structure (create new table) just to add one more category or type.

If your category/type column is tinyint or smallint (depending on reasonably expected number of possible values), then you actually have proper normalization in place. If there is no (functional) dependence between two keys to categories for two different rows then those two values are not a redundancy but a minimal way to store the actual data.

From performance standpoint the important part is proper indexing for your queries. If you often search for rows with specific category and/or type, you should have in place index on that column (or composite index on both columns, and possibly other) to optimize such queries. You should enable the slow query log and periodically check what queries are taking most time (you may use pt-query-digest to analyze the log) and optimizing those (adding proper indexes and/or rewriting the queries).

When you have proper indexes in place then in your proposed case of multiple tables actually selecting the right table to read will take about the same time as MySQL needs to "jump" to the right part of the index. (It is a bit oversimplification but the point is that selecting the right table to read is an overhead in itself).

From the "schema" standpoint I would be more concerned about those two columns criteria1 and criteria2 if these are their real names and usage then that seems like a possible problem - but that may depend on your use case and may be actually OK. But if there is any chance you will once add criteria3 and others then that would tell you to add separate table for those, one row per criteria and FK to the current table.