SQL Server – Best Practices for Table Naming and Database Normalization

best practicesdatabase-designsql server

I'm creating a database of simulation results, and am trying to do so the right way. I've attempted to show relationships between data to make sure nothing redundant is thrown in.

My current structure is like this:

Experiment

Primary key: SimulationID

Tables:

measurement
sampling_rate
first_draft_flow_rate
second_draft_flow_rate
final_flow_rate

Foreign key relating these tables: cycle_count

Each table contains the primary key, foreign key, and values for the variables.

So for a given experiment, it runs for so many cycles (the number of cycles it runs for varies from simulation to simulation). We log data at each cycle of the simulation.

I've made a table called measurement, sampling_rate, etc, but am not sure what to name the columns. Should the also be called measurement, sampling_rate, etc? Or should I just use Value as the column name?

Here's a spreadsheet with a sample log to demonstrate what I'm working with. All of this data would be filed under a single SimulationID.

Also, any tips on how best to design a DB to be normal/best practices would be greatly appreciated.

Best Answer

This is not database administration, this is data modeling. Very different disciplines.

I'm guessing your main entity is Simulation and the tables you list describe it. You don't show the structure or content of these tables so the following is based on conjecture.

Measurement looks like it could be a list of measurement types: temperature, flow, particles per unit volume, etc. SamplingRates also looks like a list of valid rates: 1/sec, 10/sec, 100/sec, etc.

Finally there are three table that look like they should be one, FlowRates, that is also a lookup table.

This would mean a Simulation is the recorded results of, say, a temperature reading at a rate of 10 times per second of a 30 ml/sec flow.

Is that accurate? If so, here would be an example:

Measurements
 ID  Name
  1  Temperature
  2  Particles per ml

SamplingRates
 ID  Name  Period
  1     1     sec
  2    10     sec

FlowRates
 ID  Rate  Unit Period
  1    10    ML    sec
  2    20    ML    sec
  2    30    ML    sec

So the example Simulation entry would show a Measurement of 1, SamplingRate of 2 and FlowRate of 3 -- along with the results of the measurement, of course, and probably a timestamp of when the simulation was performed.

It would help a lot if you would give a plain language description of an experiment: "An experiment consists of any number of simulations. A simulation is made up of various readings of...made at a certain frequency based on ..." Don't think in terms of tables and columns. Pretend you're talking to a lab technician.

Update: When designing a table (and this pertains to naming the fields) it is generally a good idea to isolate the entity from all other entities -- that is, the naming should be done as context-free as possible. What that means is if you have a field that represents, say, the name or description of the entity, then by all means call those fields "Name" and "Description". No matter that you have dozens of other fields with identical names in tables scattered around the database.

A table has no context. It is the query that establishes a context.

select  s.Name as NewSite, u1.Name as Owner, u2.Name as Manager
from    Site s
join    Users u1
    on  u1.ID = s.OwnerID
join    Users u1
    on  u2.ID = s.MgrID
where   s.Created > '2015-01-01';

Here are three tables each with the field Name -- it really doesn't matter that one table is used twice. Within each table, Name means "this is the name of the entity represented by this row."

The context established by this query is easily identified. It is looking at the owner and manager of all newly created sites and renames each Name field to suit the context. Different queries can and do use the fields in completely different contexts. As it is the query which sets the context, let the query rename the fields to whatever best suits that context.

Don't try to force a context with a table by naming the fields User_ID or User_Name and so forth. In a query, fields should be prefixed with the table name or alias anyway, so there's never any confusion.

where  User.Name = 'John Smith'

Compare with

where  User.User_Name = 'John Smith'

The extra "User_" adds no useful information. Besides, you're bound to hit something like this:

where ExtremelyLongTableName.ExtremelyLongTableName_SomewhatLongFieldName = ...

I'm exhausted just typing it the one time. Besides, some DBMSs limit the length of object names. Oracle, iirc, only considers the first 32 characters of an object name. I've hit that limit more than once in shops that use the tablename_fieldname convention. At that point you have to use abbreviations which is really messy.

Anyway, "best practices" is a fairly subjective concept. Opinions will vary. Choose what is most comfortable for you.

Related Solutions

Postgresql – Table structure/schema for spreadsheet-like web app (ex: Google Docs)

I've never attempted something like this, but maybe a schema such as this would work:

Spreadsheet
    spreadsheet_id (unique key)
    name
    num_rows
    num_cols

column_types
    type_id (unique key)
    type_name

spreadsheet_rows
    spreadsheet_row_id (unique key)
    spreadsheet_id (refers to spreadsheet.spreadsheet_id)
    row_seq_num (for on-screen row-ordering)
    row_name

spreadsheet_cols
    spreadsheet_col_id (unique key)
    spreadsheet_id (refers to spreadsheet.spreadsheet_id)
    col_seq_num (for on-screen column-ordering)
    column_type_id (refers to column_types.type_id)
    column_name

spreadsheet_cells
    cell_id (unique key)
    spreadsheet_id (refers to spreadsheet.spreadsheet_id)
    row_id (refers to spreadsheet_rows.spreadsheet_row_id)
    col_id (refers to spreadsheet_cols.spreadsheet_col_id)
    cell_value (holds the actual value!)

Of course you'd only create records in spreadsheet_cells for non-null values...

Sql-server – With MS SQL Server are the generated constraint names predictable

No, the constraint name is completely unpredictable. If you want your names to be consistent, you can name them correctly by applying a predictable / repeatable name manually. I have no idea how you would do this in the code you have, but in T-SQL instead of:

CREATE TABLE dbo.foo(bar INT PRIMARY KEY);

CREATE TABLE dbo.blat(bar INT FOREIGN KEY REFERENCES dbo.foo(bar));

(The above end up with constraints having names like PK__foo__DE90ECFF6CF25EF6 and FK__blat__bar__1B1EE1BE.)

You would say:

CREATE TABLE dbo.foo(bar INT, CONSTRAINT PK_foo PRIMARY KEY (bar));

CREATE TABLE dbo.blat(bar INT, CONSTRAINT fk_foobar FOREIGN KEY(bar) 
  REFERENCES dbo.foo(bar));

Best Answer

Related Solutions

Postgresql – Table structure/schema for spreadsheet-like web app (ex: Google Docs)

Sql-server – With MS SQL Server are the generated constraint names predictable

Related Question