Table “normalization” and tradeoffs

database-designnormalization

It's been a while since my databases course in university, so bear with me as I try to be unambiguous here.

In the domain I'm working in, there are Sources and Destinations. Both have 2D coordinates (an X and Y coordinate). Both Source and Destination have separate data in addition to the coordinates. Finally, there is a one-to-many relationship between Source and Destination.

Idea #1

Use two tables, one for each type of coordinate:

SourceID  |  XCoord  |  YCoord |  Other Source Columns |
--------------------------------------------------------
332       |  33      |  4      |          ...          |
423       |  213     |  342    |                       |
211       |  365     |  12     |                       |

DestinationID  |  Source ID  |  XCoord  |  YCoord |  Other Dest. Columns |
--------------------------------------------------------------------------
455            |  332        |  24      |  4      |          ...         |
892            |  332        |  653     |  32     |                      |
112            |  211        |  5       |  122    |                      |

Idea #2

Use three tables, where Coordinates are stored in their own table, but everything else is the same:

CoordinateID  |  XCoord  |  YCoord |
------------------------------------
1             |  33      |  4      |
2             |  213     |  342    |
3             |  365     |  12     |
4             |  24      |  4      |
5             |  653     |  32     |
6             |  5       |  122    |

SourceID  |  CoordinateID |  Other Source Columns |
---------------------------------------------------
332       |  4            |         ...           |
423       |  5            |                       |
211       |  6            |                       |

DestinationID  |  SourceCoordID |  CoordinateID |  Other Dest. Columns |
------------------------------------------------------------------------
455            |  332           |   1           |          ...         |
892            |  332           |   2           |                      |
112            |  332           |   3           |                      |

The differences I can think of are these:

  • The second one will perform worse: in order to get the (x, y) coordinates of a source or destination, the source or destination table must be joined with the coordinate table.
  • The second one is more maintainable: only one table needs changing if, say, a timestamp needs to be stored with every (x, y) pair.

Are those differences correct?

(I'm happy to remove the next question if it's too open ended or off-topic.) Are there other significant differences?

Best Answer

JOIN operations are not as expensive as you might expect. Many times the benefit of normalization will far outweigh the "costs" of joining the tables, specifically if you have "common" coordinates that are used by more than one destination/source.

I would likely model this using two tables, one for coordinates, and one to describe the points (sources and destinations).

For instance:

CREATE TABLE dbo.Coords 
(
    CoordinateID INT NOT NULL
        CONSTRAINT PK_Coords
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , x INT NOT NULL
    , y INT NOT NULL
);

CREATE TABLE dbo.Points 
(
    PointID INT NOT NULL
        CONSTRAINT PK_Points
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , CoordinateID INT NOT NULL
        CONSTRAINT FK_Points_CoordinateID
        FOREIGN KEY
        REFERENCES dbo.Coords(CoordinateID)
    , RelativePointID INT NULL
        CONSTRAINT FK_Points_RelativePointID
        FOREIGN KEY REFERENCES dbo.Points(PointID)
);

INSERT INTO dbo.Coords (x, y)
VALUES (1, 2)
    , (1, 3)
    , (1, 4);

INSERT INTO dbo.Points (CoordinateID, RelativePointID)
VALUES (1, NULL);

INSERT INTO dbo.Points (CoordinateID, RelativePointID)
VALUES (2, 1);

INSERT INTO dbo.Points (CoordinateID, RelativePointID)
VALUES (3, 2);


SELECT SrcX = src_c.x
    , SrcY = src_c.y
    , DestX = dest_c.x
    , DestY = dest_c.y
FROM dbo.Points src
    INNER JOIN dbo.Coords src_c ON src.CoordinateID = src_c.CoordinateID
    LEFT JOIN dbo.Points dest ON src.RelativePointID = dest.PointID
    INNER JOIN dbo.Coords dest_c ON dest.CoordinateID = dest_c.CoordinateID;

Results:

enter image description here

This design allows any point to serve as the destination for any other source point, effectively allowing you to create a map of directions from one point to any other.