Normalization of Time Series Data

database-designnormalizationteradata

I have a table containing various a/b test experiment performance on a daily basis. The table structure looks something like this:

date     | experiment | user_count | user_attr | sales
---------------------------------------------------------
6/1/2014 |      a     |     21     |      x    |   1
6/1/2014 |      a     |     15     |      y    |   2
6/1/2014 |      b     |     29     |      y    |   0
6/2/2014 |      a     |     20     |      x    |   0
6/2/2014 |      b     |     18     |      x    |   3
6/2/2014 |      b     |     31     |      y    |   5

While this table structure seemed to be the most logical one when I created it, I'm learning more about database normalization and it seems like this structure may be inefficient (e.g. repetition of experiments and attributes on each day).

I was wondering if there was a preferred structure for this type of data while preserving the daily performance for each experiment (use case involves monitoring daily trends). Is there a better way to model this?

Best Answer

The table itself does not look like it contains redundant data. But, you explain "repetition of experiments and attributes on each day."

If the experiment had other attributes, splitting experiment definition into one table and results into another would be a standard normalization.

Experiment(Id, Name, Description)
Result(Experiment (FK), Participants, Participants_Attribute, Sales)

However, with the naming of the experiments "a" and "b" that seems to be covered already.

Another point of concern is the aggregate data, e.g. user_count. Normally, each result is stored separately and user_count would be a query on a child table. However, if the result is a group datum (individual data not stored), the current method is appropriate.

Best Answer

Related Solutions

Normalizing nearly identical tables

Postgresql – HStore vs. multiple tables with inheritance – do I have a good use case for HStore

Related Question