SQL Server – Best Practices for Table Naming and Database Normalization

best practicesdatabase-designsql server

I'm creating a database of simulation results, and am trying to do so the right way. I've attempted to show relationships between data to make sure nothing redundant is thrown in.

My current structure is like this:

Experiment

Primary key: SimulationID

Tables:

  • measurement
  • sampling_rate
  • first_draft_flow_rate
  • second_draft_flow_rate
  • final_flow_rate

Foreign key relating these tables: cycle_count

Each table contains the primary key, foreign key, and values for the variables.

So for a given experiment, it runs for so many cycles (the number of cycles it runs for varies from simulation to simulation). We log data at each cycle of the simulation.

I've made a table called measurement, sampling_rate, etc, but am not sure what to name the columns. Should the also be called measurement, sampling_rate, etc? Or should I just use Value as the column name?

Here's a spreadsheet with a sample log to demonstrate what I'm working with. All of this data would be filed under a single SimulationID.

spreadsheet

Also, any tips on how best to design a DB to be normal/best practices would be greatly appreciated.

Best Answer

This is not database administration, this is data modeling. Very different disciplines.

I'm guessing your main entity is Simulation and the tables you list describe it. You don't show the structure or content of these tables so the following is based on conjecture.

Measurement looks like it could be a list of measurement types: temperature, flow, particles per unit volume, etc. SamplingRates also looks like a list of valid rates: 1/sec, 10/sec, 100/sec, etc.

Finally there are three table that look like they should be one, FlowRates, that is also a lookup table.

This would mean a Simulation is the recorded results of, say, a temperature reading at a rate of 10 times per second of a 30 ml/sec flow.

Is that accurate? If so, here would be an example:

Measurements
 ID  Name
  1  Temperature
  2  Particles per ml

SamplingRates
 ID  Name  Period
  1     1     sec
  2    10     sec

FlowRates
 ID  Rate  Unit Period
  1    10    ML    sec
  2    20    ML    sec
  2    30    ML    sec

So the example Simulation entry would show a Measurement of 1, SamplingRate of 2 and FlowRate of 3 -- along with the results of the measurement, of course, and probably a timestamp of when the simulation was performed.

It would help a lot if you would give a plain language description of an experiment: "An experiment consists of any number of simulations. A simulation is made up of various readings of...made at a certain frequency based on ..." Don't think in terms of tables and columns. Pretend you're talking to a lab technician.

Update: When designing a table (and this pertains to naming the fields) it is generally a good idea to isolate the entity from all other entities -- that is, the naming should be done as context-free as possible. What that means is if you have a field that represents, say, the name or description of the entity, then by all means call those fields "Name" and "Description". No matter that you have dozens of other fields with identical names in tables scattered around the database.

A table has no context. It is the query that establishes a context.

select  s.Name as NewSite, u1.Name as Owner, u2.Name as Manager
from    Site s
join    Users u1
    on  u1.ID = s.OwnerID
join    Users u1
    on  u2.ID = s.MgrID
where   s.Created > '2015-01-01';

Here are three tables each with the field Name -- it really doesn't matter that one table is used twice. Within each table, Name means "this is the name of the entity represented by this row."

The context established by this query is easily identified. It is looking at the owner and manager of all newly created sites and renames each Name field to suit the context. Different queries can and do use the fields in completely different contexts. As it is the query which sets the context, let the query rename the fields to whatever best suits that context.

Don't try to force a context with a table by naming the fields User_ID or User_Name and so forth. In a query, fields should be prefixed with the table name or alias anyway, so there's never any confusion.

where  User.Name = 'John Smith'

Compare with

where  User.User_Name = 'John Smith'

The extra "User_" adds no useful information. Besides, you're bound to hit something like this:

where ExtremelyLongTableName.ExtremelyLongTableName_SomewhatLongFieldName = ...

I'm exhausted just typing it the one time. Besides, some DBMSs limit the length of object names. Oracle, iirc, only considers the first 32 characters of an object name. I've hit that limit more than once in shops that use the tablename_fieldname convention. At that point you have to use abbreviations which is really messy.

Anyway, "best practices" is a fairly subjective concept. Opinions will vary. Choose what is most comfortable for you.