MySQL Database Design – Handling Very Long Growing Row Size

database-designMySQLpivot

My data has the property to grow in the row direction and the column direction at the same time. The size will grow in both directions without limit while I do more research.

The number of columns grows more than 1000 now and I got this error:

 Warning: (139, 'Row size too large (> 8126). Changing some columns to TEXT or BLOB may help. In current row format, BLOB prefix of 0 bytes is stored inline.')

I read this post but increasing the page size would not help because the row size is growing and it will exceed anything at the end.

I want to restructure the table but I don't have a good idea. How can I restructure this table?


P.S. As a reference, it's time-series data and I put the timestamps as the index. And the columns can be grouped into categories and each group size grows at the same time.


P.P.S I add more details. There are a lot of columns and I decided to combine all of them into one table because each column shares the same timestamps. In addition, to track the identity of column, I made one extra table describing what the column is about.

This is the part of the data table:

+----------------------------+-----------+---------------+--------------------+
| etstamp                    | 1         | 2             | 3                  |
+----------------------------+-----------+---------------+--------------------+
| 2019-01-01 13:53:30.000000 |   3802325 |  -12298828971 |  0.964893760996075 |
| 2019-01-01 14:01:00.000000 | -13088926 |  -12298828971 |  -1.33234100135318 |
| 2019-01-01 14:06:00.000000 | -13088926 |   -5892372191 |  0.343203392148817 |
| 2019-01-01 18:35:30.000000 |  -3751101 |   -4074909927 |  -1.28552097428958 |
| 2019-01-01 23:19:00.000000 |  40493616 |   90751800131 |   1.41623954680335 |
| 2019-01-01 23:20:30.000000 |  40493616 |   47623820910 |   2.07233344695927 |
| 2019-01-02 16:46:00.000000 |  -1251261 |   10093182491 |   2.12765957446809 |
| 2019-01-02 22:57:00.000000 |  14034018 |    9768467942 |   1.32055929570171 |
| 2019-01-03 05:24:00.000000 |  -2613264 |  -56186202376 |  -1.06769174979049 |
| 2019-01-03 16:23:00.000000 | -22956977 |   -5525892956 |  -2.04214360041623 |
| 2019-01-04 00:13:30.000000 | -10053384 |   -9177100417 |  -1.39583882012115 |
| 2019-01-04 09:05:30.000000 |  -5494943 |  -21081825567 | -0.966329662393934 |
| 2019-01-04 14:43:30.000000 |   4678636 |  -70623900257 |  -1.26799630167745 |
| 2019-01-04 20:54:00.000000 |  23879504 |   41962608843 |               0.96 |
| 2019-01-04 20:59:30.000000 |  16930573 |   41962608843 |   -0.5652942203597 |
| 2019-01-04 21:33:30.000000 |  16930573 |   49439408153 |   1.63281561131024 |
| 2019-01-05 00:16:00.000000 |  15617092 |   56326921973 |  0.552369137856527 |
| 2019-01-05 22:08:30.000000 | -28280428 | -100817061177 |  -1.41613615694426 |
| 2019-01-05 22:16:30.000000 | -28280428 |  -18081058461 | -0.364971120115236 |
| 2019-01-06 02:01:30.000000 | -11802595 |  -42569384784 | -0.958499367931412 |
+----------------------------+-----------+---------------+--------------------+
20 rows in set (0.00 sec)

This is a sample table to describe the column identity.

+----------+-------------+---------------------+---------------------+----------------------------+------------+
| columnid | fcolltype   | stime               | etime               | indinfo                    | indbpsetid |
+----------+-------------+---------------------+---------------------+----------------------------+------------+
|        1 | TECHNICAL01 | 2019-01-01 00:00:00 | 2019-07-01 00:00:00 | VolImb                     |         15 |
|        2 | TECHNICAL01 | 2019-01-01 00:00:00 | 2019-07-01 00:00:00 | DolImb                     |         18 |
|        3 | TECHNICAL01 | 2019-01-01 00:00:00 | 2019-07-01 00:00:00 | PJUMP[9, high, low, close] |         50 |
+----------+-------------+---------------------+---------------------+----------------------------+------------+
3 rows in set (0.00 sec)

(The columnid corresponds to the column name of the data table.)

Best Answer

So the pieces are there, but not structured in a way for a database to quickly access the data you need, or to allow growth without major restructuring.

In your case the best way to do that is to arrange the data in a longer format instead of the wide format you have. So instead of having each column represent an entity identifiable only through a metadata table, you will have a key identify each row and the information will be organized by that key/timestamp combination. Not only will this allow your database to grow without issue, you can also add new entities through inserting rows, not rebuilding your main table. It's also much faster for most queries.

After discussion in chat, this is the model I believe this is a good starting point for your data. (For anyone not familiar with ER Diagrams, I always recommend reading this first).

enter image description here

This may require some tweaks as you go along, but the general structure should be there.

Note: The current diagram allows the same ModelCode to be used with more than one Instrument. This is useful if the code is easily understood to mean a certain type of model or prediction. If that is the case, adding a ModelType entity to give additional meaning/integrity to ModelCode would be desirable. But if you want ModelCode to be unique across all instruments, you would need to place an alternate key (unique constraint) on that column.