Storing arrays of data in a time-series database

arraytime-series-database

I'm building an low utilization time-series database to capture yearly data points for a set of items fewer than 100,000.

My question has to do with storing arrays of data in a way that is easily queried later.

Right now the yearly_visits table looks something like:

visitID       MEDIUMINT primary key
userID        SMALLINT id of individual submitting yearly data
weight        SMALLINT weight of individual

The intake form also contains a checkbox with a list of favorite colors (via numeric value) from a lookup table. Users can select one or more colors.

Should colors be stored in a separate visit_colors table that looks something like:

 visitID MEDIUMINT 
 colorID SMALLINT

Or is there a better way of storing arrays of data in a time series? I haven't written any code yet, so I want to design this in a way that doesn't bite me down the road when I'm asked to query against the color data down the road.

Best Answer

I've read about (but not implemented) Timescale, a time-series database over Postgres. It claims to support all the PostreSQL native datatypes, including arrays. In that product the array type is supported within the query language. I don't imagine Postres disappearing any time soon so a solution based on it would be somewhat future-proof.

Related Solutions

Recommendation for storage of series of time series

I can suggest Akumuli. It's a time-series database that supports compression and high-throughput data ingestion. With 25KHz measurement frequency and 20 engines, you will need to write 500K data points per second in the worst case. Akumuli can handle an order of magnitude larger throughput (largest throughput ever recorded is around 16M data points per second).

Also, because of compression, the database needs only around 3-9 bytes per data point. Each data point is a timestamp with nanosecond precision + 64-bit floating point value. There is an automatic data retention that deletes old data only if there is not enough disk space to store the new data.

You can store data from each engine in the same time-series or you can create new time-series per burst.

The real time-series database can be a big win because you won't need to use all these fancy tricks. There is a downsides of cause. E.g. there is no clustering and backfill.

Disclaimer: I'm the author so I'm a bit biased.

High Cardinality Time Series Database Design

Just normalize it.

crawl-
crawl_id|crawltime
1       |2018-01-02 2:47 PM

terms-
crawl_id | term | count
1        | Joe  | 15

Best Answer

Related Solutions

Recommendation for storage of series of time series

High Cardinality Time Series Database Design

Related Question