I would go for separate tables for each type. With so few rows performance will not be a consideration either way. Individual tables will only have a page or two each. Similarly for the combined table.
Either way you will have to join a "type" table to complete your views. Inserts are infrequent at this level of abstraction so locking is unlikely to be an issue either way.
For me, though, having different tables keeps logically distinct items separated, which is a cleaner design.
There will be many different tables which will have a relationship to a type table of some sort. If there are many different type tables the foreign key relationships are self-documenting and enforcable through DRI. With one combined table it is difficult to use DRI to say "Venue.TypeID comes from Type.TypeID but only if the Type.Category = 'V'", for example. The complementary side of this relationship - "Type.TypeID with Type.Category = 'V' can only be used in the Venue table" - is also not supported in any RDBMS's DRI that I know of.
You could expand your activities table so that instead of a single timestamp column, it includes two timestamps, ts_from
and ts_to
, that indicate the time period for which the row's data is applicable. As long as no data changes, you do not insert any new rows into your table. When data does change, you insert the new data into a new row and update the old row's ts_to
column.
Then you can create a calendar
table that contains all timestamps of interest to you (in your case that table would contain a row for every day and every hour in the period of time that is of interest to you).
Finally, you get the result by cross-joining the two tables and keeping only those records whose date (from calendar table) falls between ts_from
and ts_to
.
For example (my example uses only dates and not datetimes, but you can change that easily):
create table calendar (
d date
);
create table time_data (
d_from date,
d_to date,
val1 int(10),
val2 int(10)
);
insert into time_data values('2015-01-01', '2015-01-15', 10, 15);
insert into time_data values('2015-01-15', '2015-02-21', 7, 18);
insert into time_data values('2015-02-21', '2015-03-04', 1, 55);
insert into time_data values('2015-03-04', '2015-04-21', 3, 22);
insert into time_data values('2015-04-21', null, 9, 99);
I filled the calendar table with all dates from 2015-01-01 to now and did the simple query
select d, val1, val2
from calendar, time_data
where d >= d_from and d < ifnull(d_to, curdate())
order by d
The resultset contains an entry for every date in the range (2015-01-01, 2015-05-01
) and val1, val2 from last entry from time_data
before that date.
For a nice procedure to fill the calendar table, see this article.
However, since a cross join is an expensive operation, and you have large amounts of data, thoroughly test this (or any other) solution before applying it to your production system.
Best Answer
As long as all the data have (roughly) the same attributes, and the only difference is the
type
, storing them in a single table is fine.To make your query fast, create an index on
type
. Then two things can happen:there are few enough rows of that
type
that you get an index scan (or bitmap index scan in PostgreSQL)the
type
occurs often enough that PostgreSQL opts for a sequential scanIn both cases you won't be much slower than with many small tables.
The final decision might depend on the number of tables (having 10000 tables in a database is not much fun) and the type of query you have:
The following can be a static query with parameters:
while the following requires you to compose an SQL statement for every query:
If you want to query several types at once, it is nicer to write
than having to write