Time series data of several sources in one SQLite table or several? File size impact

schemasqlitetime-series-database

Here is the situation:

  • A large amount of time series data from a very limited set of sources is to be kept in a SQLite database file.
  • Time series sources are identified by a 10 character long string. The number of sources is very low and not at all dynamic.
  • There is no immediate need for any correlation analysis between the time series sources.

I guess, conventionally, one would dump all time series data in a single table with a text field identifying the source of each data point.

However, I am wondering whether it would be more efficient to create a separate table for each source of time series data? Will the file size of the SQLite database eventually be much smaller by eliminating the repetitive text field? Will queries run much faster?

Or should I completely refrain from creating a table for each source and for what reason?

The database will not be huge, which may warrant the use of SQLite over MySQL. Apart from the integer timestamps, there are four more integer fields. Measurements are made every 3 minutes, but it could easily run for a year or more. Part of the design might end up in an embedded device with a very limited amount of flash memory (64Mbit), hence my initial interest for SQLite. However, I am open to other suggestions.

Best Answer

Personally, I hate to throw away potentially useful data. I'd create a table for the sources, and include source_id in the time series table as a foreign key to the source table. This should take up less space, but still retain the source information (without requiring multiple identical tables).

I've put together a short example; see this db-fiddle link.

Here's the code form my example:

CREATE TABLE source
     ( source_id INTEGER PRIMARY KEY ASC
      ,name varchar(10)
     );

CREATE TABLE time_series
     ( series_id INTEGER PRIMARY KEY ASC
      ,timestamp INTEGER
      ,value1 INTEGER
      ,value2 INTEGER
      ,value3 INTEGER
      ,source_id INT
      ,FOREIGN KEY (source_id) REFERENCES source(source_id)
     );

INSERT INTO source (name)
VALUES ('AAAA'), ('BBBB'), ('QZQZ');

INSERT INTO time_series (timestamp, value1, value2, value3, source_id)
VALUES (12345678, 100, 105, 110, 1)
      ,(12345681, 105, 105, 105, 1)
      ,(12345684, 110, 105, 100, 1)
      ,(12345678, 9, 27, 81, 3)
      ,(12345681, 27, 81, 243, 3)
      ,(12345684, 81, 243, 729, 3)
;



SELECT * FROM source;

SELECT s.name as source, timestamp, value1, value2, value3
  FROM source s INNER JOIN time_series ts ON (s.source_id = ts.source_id)
 ORDER BY source, timestamp
;

I should note that I don't normally use SQLite. As you noted in your response to my original comment, SQLite maintains a 64-bit integer row ID column in every table by default. I've set up the tables in the example to use that row ID value as the primary key for each table. If I've read the documentation correctly, the foreign key column should just be big enough to hold the value from the primary key. Assuming you don't manually insert a source_id that's huge, I believe that source_id in the time_series table should only require 1 byte.

You might want to put a UNIQUE index on source_id and timestamp (you should almost certainly have some sort of index on them); presumably, you should never have two entries for the same source and the same time. As pointed out by Serge Stroobandt, this could even be the primary key. However, since SQLite will create a unique row ID value anyway, I'd be inclined to leave that as the key. If you really wanted to use source_id and timestamp as the primary key, look into creating the table WITHOUT ROWID.

Related Question