Mysql – How should I model a database structure to retain calculation results

I am trying to improve the performance of the queries related to one table that I consider "weird", which could be created by the next script:

CREATE TABLE `CalcTable` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `result` double DEFAULT NULL,
  `calc_dt` datetime DEFAULT NULL,
  `project_id` int(11) NOT NULL,
  `calculation_type_cd` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`));

Currently, we have 17 different calculation types. A calculation type could be:

"current kilowatt per hour",
"expected kilowatt per hour",
"forecast kilowatt per hour",
etc.

The data could grow to 250 different calculation types in the future. Also, a situation that may arise is that a customer could create his own calculation types, as much as he/she wants…

So I tried to understand how can I design a table structure to work with this data on the best way now, and to not crash the database in the future (with 250+ new calculation types).

Below you can find current data sample:

id       result  calc_dt              project_id  calculation_type_cd
-------  ------  -------------------  ----------  -------------------
1906901  0.1028  2016-01-18 15:00:00         188  TypeA
1906902  336.68  2016-01-18 15:00:00         188  TypeB
1906903  341.55  2016-01-18 15:00:00         188  TypeC
1906904  0.4531  2016-01-18 16:00:00         188  TypeD
1906905  368.39  2016-01-18 16:00:00         188  TypeA
1906906  38.080  2016-01-18 16:00:00         188  TypeB
1906907  420.34  2016-01-18 16:00:00         188  TypeF

So you can imagine what would be here in the future… Dozens of rows with some columns containing duplicated data…

I'm thinking about two options:

Create 250 columns for storing results for different calculation types
Create different tables for some groups of calculation types

What do you think about it? Does it make sense?

Maybe you have better approaches – please let me know!

Thanks a lot in advance.

I'm using MySQL 5.7 on AWS.

Best Answer

If you don't know how many columns it will have, it may vary all the way up to 250, and the entries might be sparse, I would:

Use the EAV design to store the calculation types on separate rows.
For reporting/analysis purposes, create a view or summary table that pivots rows to columns and "flattens" them out.

One day of data you could probably get fast with indexes. Three years, likely not. For the second case, you might want to look into something more analytically-focused, like Greenplum (on-prem) or Redshift (AWS).

Best Answer

Related Solutions

I have this two models, which one is better

MySQL – How to Optimize Simple Query on 4M-Rows Table

Related Question