Database Design – Storing Multiple Tables by Type vs One Table

database-designMySQLpostgresql

I want to store a lot of rows in a table, but I will often want to select only some of them, depending of their type. So I could just add a column type to each row, but then, if I want to select only the rows of one type, I'll have to make a query like SELECT * FROM MyTable WHERE type = 1 for example. But does it mean that the DBMS will have to fetch all the table, even if only a few rows have the type that I want?

Is it better to have a table per type, so I could just SELECT * FROM MyTableType1, without having to use a WHERE?

Best Answer

As long as all the data have (roughly) the same attributes, and the only difference is the type, storing them in a single table is fine.

To make your query fast, create an index on type. Then two things can happen:

there are few enough rows of that type that you get an index scan (or bitmap index scan in PostgreSQL)
the type occurs often enough that PostgreSQL opts for a sequential scan

In both cases you won't be much slower than with many small tables.

The final decision might depend on the number of tables (having 10000 tables in a database is not much fun) and the type of query you have:

The following can be a static query with parameters:
```
SELECT ... FROM mytable WHERE type = 'type1';
```
while the following requires you to compose an SQL statement for every query:
```
SELECT ... FROM mytable_type1;
```

If you want to query several types at once, it is nicer to write

SELECT ... FROM mytable WHERE type IN ('type1', 'type2', 'type3');

than having to write

SELECT ... FROM mytable_type1
UNION ALL
SELECT ... FROM mytable_type2
UNION ALL
SELECT ... FROM mytable_type3;

Related Solutions

PostgreSQL – Combining Multiple ‘Type’ Tables When Normalizing

I would go for separate tables for each type. With so few rows performance will not be a consideration either way. Individual tables will only have a page or two each. Similarly for the combined table.

Either way you will have to join a "type" table to complete your views. Inserts are infrequent at this level of abstraction so locking is unlikely to be an issue either way.

For me, though, having different tables keeps logically distinct items separated, which is a cleaner design.

There will be many different tables which will have a relationship to a type table of some sort. If there are many different type tables the foreign key relationships are self-documenting and enforcable through DRI. With one combined table it is difficult to use DRI to say "Venue.TypeID comes from Type.TypeID but only if the Type.Category = 'V'", for example. The complementary side of this relationship - "Type.TypeID with Type.Category = 'V' can only be used in the Venue table" - is also not supported in any RDBMS's DRI that I know of.

MySQL – Best Design for Data Storage

You could expand your activities table so that instead of a single timestamp column, it includes two timestamps, ts_from and ts_to, that indicate the time period for which the row's data is applicable. As long as no data changes, you do not insert any new rows into your table. When data does change, you insert the new data into a new row and update the old row's ts_to column.

Then you can create a calendar table that contains all timestamps of interest to you (in your case that table would contain a row for every day and every hour in the period of time that is of interest to you).

Finally, you get the result by cross-joining the two tables and keeping only those records whose date (from calendar table) falls between ts_from and ts_to.

For example (my example uses only dates and not datetimes, but you can change that easily):

create table calendar (
    d date
);

create table time_data (
    d_from date,
    d_to date,
    val1 int(10),
    val2 int(10)
);

insert into time_data values('2015-01-01', '2015-01-15', 10, 15);
insert into time_data values('2015-01-15', '2015-02-21', 7, 18);
insert into time_data values('2015-02-21', '2015-03-04', 1, 55);
insert into time_data values('2015-03-04', '2015-04-21', 3, 22);
insert into time_data values('2015-04-21', null, 9, 99);

I filled the calendar table with all dates from 2015-01-01 to now and did the simple query

select d, val1, val2 
from calendar, time_data  
where d >= d_from and d < ifnull(d_to, curdate()) 
order by d

The resultset contains an entry for every date in the range (2015-01-01, 2015-05-01) and val1, val2 from last entry from time_data before that date.

For a nice procedure to fill the calendar table, see this article.

However, since a cross join is an expensive operation, and you have large amounts of data, thoroughly test this (or any other) solution before applying it to your production system.

Best Answer

Related Solutions

PostgreSQL – Combining Multiple ‘Type’ Tables When Normalizing

MySQL – Best Design for Data Storage

Related Question