Here is the situation:
- A large amount of time series data from a very limited set of sources is to be kept in a SQLite database file.
- Time series sources are identified by a 10 character long string. The number of sources is very low and not at all dynamic.
- There is no immediate need for any correlation analysis between the time series sources.
I guess, conventionally, one would dump all time series data in a single table with a text field identifying the source of each data point.
However, I am wondering whether it would be more efficient to create a separate table for each source of time series data? Will the file size of the SQLite database eventually be much smaller by eliminating the repetitive text field? Will queries run much faster?
Or should I completely refrain from creating a table for each source and for what reason?
The database will not be huge, which may warrant the use of SQLite over MySQL. Apart from the integer timestamps, there are four more integer fields. Measurements are made every 3 minutes, but it could easily run for a year or more. Part of the design might end up in an embedded device with a very limited amount of flash memory (64Mbit), hence my initial interest for SQLite. However, I am open to other suggestions.
Best Answer
Personally, I hate to throw away potentially useful data. I'd create a table for the sources, and include
source_id
in the time series table as a foreign key to the source table. This should take up less space, but still retain the source information (without requiring multiple identical tables).I've put together a short example; see this db-fiddle link.
Here's the code form my example:
I should note that I don't normally use SQLite. As you noted in your response to my original comment, SQLite maintains a 64-bit integer row ID column in every table by default. I've set up the tables in the example to use that row ID value as the primary key for each table. If I've read the documentation correctly, the foreign key column should just be big enough to hold the value from the primary key. Assuming you don't manually insert a
source_id
that's huge, I believe thatsource_id
in thetime_series
table should only require 1 byte.You might want to put a
UNIQUE
index onsource_id
andtimestamp
(you should almost certainly have some sort of index on them); presumably, you should never have two entries for the same source and the same time. As pointed out by Serge Stroobandt, this could even be the primary key. However, since SQLite will create a unique row ID value anyway, I'd be inclined to leave that as the key. If you really wanted to usesource_id
andtimestamp
as the primary key, look into creating the tableWITHOUT ROWID
.