Create two tables with identical superkeys

database-designsurrogate-key

I have an existing table called Realms. This has the superkey rlmID (just a surrogate key). We are planning to add tables that group realms in some context — let's just calll them RealmGroups. This will require adding a realmGroupID column to Realms. No problemo.

Not all Realms will be part of RealmGroups, so Realms.realmGroupID would be nullable. In this case, is it better to have a separate table (RealmsRealmGroups) that has (rlmID,realmGroupID)? Since the superkey of Realms and RealmsRealmGroups would be the same, I see no advantage to creating a separate table, but my DBA is suggesting that we should use a separate one for organizational purposes.

Are there any technical or practical reasons to use a separate table to store information for the same superkey?

Best Answer

A separate table would give you the flexibility of having realms that are in more than one group. If it makes sense to do that in your business context, or it might be something you would want in the future, then you might want to seriously consider that option. Otherwise, having the Realms.realmGroupID as nullable is simpler and easier to work with.

Related Solutions

MySQL Database Design – Separate Tables vs Partitioning One Huge Table

I have an suggestion regarding your second option.

If you need to keep the MyISAM tables with each unique my_column value separated from other MyISAM tables, you may want to look into the MERGE (Mrg_MyISAM) Storage Engine as an alternative to table partitioning.

This will allow your multiple MyISAM tables within the same database that have identical table structures and index layout to be mapped together in such a way that single query hits all MyISAM tables.

Suppose you have a table as follows:

CREATE TABLE tb1
(
    my_column INT NOT NULL
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(30),
    PRIMARY KEY (my_column,id),
    KEY name (my_column,name)
) ENGINE=MyISAM;

and you have 3 such tables (tb1, tb2, tb3) where

my_column for tb1 = 1
my_column for tb2 = 2
my_column for tb3 = 3

You can merge them under a single table mapping as follows:

CREATE TABLE tbmerge LIKE tb1;
ALTER TABLE tbmerge ENGINE=Mrg_MyISAM UNION=(tb1,tb2,tb3);

To perform a search through all the tables, just use tbmerge. For example, suppose you want to see every name from tb2 that starts with 'Jack', you run this query:

SELECT name FROM tbmerge WHERE my_column=2 AND name LIKE 'Jack%';

Given the design of the table, you should always specify the value for my_column. In fact, for every index tb1 has, make sure my_column is always the first column. The reason? A query against tbmerge is always a query against tb1, tb2, tb3 (all underlying tables). Otherwise, this query

SELECT name FROM tbmerge WHERE name LIKE 'Jack%';

will experience horrible performance because it will perform table scans against all underlying tables. Please plan carefully the indexes you will be using, following that simple rule (using my_column as the first column of every index)

There is an additional benefit: You can INSERT into the underlying tables at your convenience, thus working with the

I discussed this using MERGE tables in an earlier post (Jan 4, 2012).

SQL Server – Different Tables with Identical Fields

I think you want to go with your first option. Here's why:

You should design your database with a primary goal of maintaining data integrity. Subtle data corruptions can be harder to debug than source code logic errors. This is because corrupted data may only cause a detectable problem when a series of conditions occur at runtime. Often you can't repeat these reliably.

The best way to focus on data integrity is to create a properly structured database. If your data is going to be edited at any time, then you're best to start with a design that's in Third Normal Form (3NF) at least. In general, you should have a table for each distinct type of thing that you have data about.

Sometimes, you may have two or more different types of things with similar attributes (data items). This puts you in the position of deciding where you make your trade off. Combining the facts about your things into one table means fewer tables. Fewer tables will mean less code as long as the different types of things are and continue to be similar. If you need to start moving in the direction of each type of thing having its own distinct set of attributes, then your code will become more complex and hard to maintain, with lots of branching logic.

It's important to remember that two different tables having the same set of attributes might be coincidental to your system or it might be consequential to your system. Database structure design is not optimized in the same way as code reuse. Cramming logically distinct but structurally similar data into the same table is not a good practice. If you have two or more semantically distinct tables with very similar or even identical structure and your programmer's spidey-senses are tingling too much, then satisfy your urge for code reuse by subclassing in your code, but leave your apples and oranges in their own tables, rather than cramming them into a "fruit" table.

In a case where your two (or more) types of things are different from one another, then having one table for each type of thing is probably an excellent place to start. This will keep your logic simple and your data clean, which makes for a high quality system.

Best Answer

Related Solutions

MySQL Database Design – Separate Tables vs Partitioning One Huge Table

SQL Server – Different Tables with Identical Fields

Related Question