I am trying to improve the performance of the queries related to one table that I consider "weird", which could be created by the next script:
CREATE TABLE `CalcTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result` double DEFAULT NULL,
`calc_dt` datetime DEFAULT NULL,
`project_id` int(11) NOT NULL,
`calculation_type_cd` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`));
Currently, we have 17 different calculation types. A calculation type could be:
- "current kilowatt per hour",
- "expected kilowatt per hour",
- "forecast kilowatt per hour",
- etc.
The data could grow to 250 different calculation types in the future. Also, a situation that may arise is that a customer could create his own calculation types, as much as he/she wants…
So I tried to understand how can I design a table structure to work with this data on the best way now, and to not crash the database in the future (with 250+ new calculation types).
Below you can find current data sample:
id result calc_dt project_id calculation_type_cd
------- ------ ------------------- ---------- -------------------
1906901 0.1028 2016-01-18 15:00:00 188 TypeA
1906902 336.68 2016-01-18 15:00:00 188 TypeB
1906903 341.55 2016-01-18 15:00:00 188 TypeC
1906904 0.4531 2016-01-18 16:00:00 188 TypeD
1906905 368.39 2016-01-18 16:00:00 188 TypeA
1906906 38.080 2016-01-18 16:00:00 188 TypeB
1906907 420.34 2016-01-18 16:00:00 188 TypeF
So you can imagine what would be here in the future… Dozens of rows with some columns containing duplicated data…
I'm thinking about two options:
-
Create 250 columns for storing results for different calculation types
-
Create different tables for some groups of calculation types
What do you think about it? Does it make sense?
Maybe you have better approaches – please let me know!
Thanks a lot in advance.
I'm using MySQL 5.7 on AWS.
Best Answer
If you don't know how many columns it will have, it may vary all the way up to 250, and the entries might be sparse, I would:
One day of data you could probably get fast with indexes. Three years, likely not. For the second case, you might want to look into something more analytically-focused, like Greenplum (on-prem) or Redshift (AWS).