Mysql – How should I design the MySQL database

database-designMySQLnormalization

I recently read about relational database normalization for the first time and I was wondering how best to design my database. If each entry has a url, some text, and a list of probabilities, what is the best way to normalize my database? I was thinking one table with columns id, url, and content, and another with id, probability, and foreign_key_id (referencing the first table), but as an SQL novice I don’t know if this is the best approach, especially since the second table would end up having several times as many rows as the first.

Best Answer

Normalization is only one aspect of the database design process. For starters, I suggest that you do something like ...

[1] Create an ERD (entity relationship diagram) aka logical diagram. Start on paper - identify the entities that your system will be dealing with. Find relevant relationships that connect the entities. Use the correct terminology: entity, attribute, unique identifier, relationship, cardinality, optionality. AVOID saying/thinking: tables, primary/foreign keys. Do NOT mix up instances and entities (each entity represents a multitude of instances). Remove redundant relationships. Use modelling software* once the paper version looks okay. * eg the (freely available) datamodeler: http://www.oracle.com/technetwork/developer-tools/datamodeler/overview/index.html

The ERD stage is probably the hardest one to learn and do. However, if you get it right, you'll have a reliable foundation for the later stages. Make sure that nothing in your ERD violates 1NF or 2NF or 3NF. A lot of confusion can arise when "implementation details" are coming into the game too early. A rather good description of the process can be found at: http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html

[2] Draw a relational/physical model - this is one step closer to the implementation. Entities (of the ERD) become tables, attributes become columns, unique identifiers become primary keys etc. Apply the correct data types (depending on the system you are using). Specify the constraints: primary/unique/foreign/check constraints eg tables on the MANY side of a relationship will need an additional foreign key column (which "references" another pk or unique column). The relational model is like a roadmap for coding queries later on!

Some modelling software is able to generate SQL(DDL) code from a model, or - vice versa - generate a model from code. If you cannot resist the temptation to write code straight away, generate a model for documentation purposes.

[3] Only if the diagrams are looking good, write the DDL code (SQL: mainly create table ... and alter table ... statements). Update the models (ERD and relational) if you make any changes to the initial "layout".

[4] Insert a decent amount of test data into each table (there are online test data generators).

[5] Write the queries, views etc as required.