MySQL Database Design – Replacing Too Many Tables with Too Many Columns

database-designeavMySQL

Background

We have a small MySQL database, which logs numeric data from devices. Each device has ~100 variables.

List of devices is stored in single table. deviceId, type and version are stored here.

Data is stored in tables, which have a name in the form of data_<deviceId>_<variable>. Each of these tables stores basically 2 pieces of data: timestamp and a variable value (double type).

What above means that each time new device is added, system creates ~100 new tables automatically. There are currently several thousand MyISAM tables in the database. Needless to say, making changes to these tables is difficult.

Example

For simplicitys sake, let's say that we have 3 devices with only a few variables:

  • Device 1 has variables A and B. (Device type 1, version 1)
  • Device 2 has variables A and C. (Device type 1, version 2)
  • Device 3 has variables D, E and F. (Device type 2, version 1)

Given above, database has following tables:

devices, with columns (deviceId, type, version)
data_1_A, with columns (timestamp, value)
data_1_B, with columns (timestamp, value)
data_2_A, with columns (timestamp, value)
data_2_B, with columns (timestamp, value)
data_3_D, with columns (timestamp, value)
data_3_E, with columns (timestamp, value)
data_3_F, with columns (timestamp, value)

Problems

  • Too many tables
  • New tables are created on the fly

A better way

To reduce the amount of tables, consider following:

Solution #1: Device version specific tables

There would be unique table for each device type. So we'd have tables:

devices, with columns (deviceId, type, version)
data_type_1_version_1, with columns (timestamp, deviceId, valueA, valueB)
data_type_1_version_2, with columns (timestamp, deviceId, valueA, valueC)
data_type_2_version_1, with columns (timestamp, deviceId, valueD, valueE, valueF)

Problem is that we have to add new table each time new device type or version is introduced. We also don't have generic way of reading values from tables as each table has different columns. Won't work.

Solution #2: 1 Data table with generic columns

devices, with columns (deviceId, type, version, varName1, varName2, varName3, ...)
data, with columns (timestamp, deviceId, value1, value2, value3, ...<enough columns for worst case>)

You would need to read devices table to understand what each value column contains. Should work.

Solution #3: Single data column and extra table for variable info.

devices, with columns (deviceId, type, version)
variables, with columns (varId, deviceId, varName)
data, with columns (timestamp, varId, value)

To read data from DB, you need to find out varId first from the variables table, and then read the actual data from the data table. Should work.

Question

Solution 2 seems simpler than 3, as it has less tables to deal with. But using solution 2, I would have 2 tables with over 100 columns each. Is that any improvement over original problem of having too many tables?

Best Answer

Solution 3 is the best solution provided that you move the deviceId from the variables table into the data table.

The reason why it is better is that when you need to add a variable to one device you can do this without changing your devices and data tables in solution 2. If you want to query the data in the solution 2 way then you can create views that look like the devices and data tables in solution 2.