It's been a while since my databases course in university, so bear with me as I try to be unambiguous here.
In the domain I'm working in, there are Sources
and Destinations
. Both have 2D coordinates (an X and Y coordinate). Both Source
and Destination
have separate data in addition to the coordinates. Finally, there is a one-to-many relationship between Source
and Destination
.
Idea #1
Use two tables, one for each type of coordinate:
SourceID | XCoord | YCoord | Other Source Columns |
--------------------------------------------------------
332 | 33 | 4 | ... |
423 | 213 | 342 | |
211 | 365 | 12 | |
DestinationID | Source ID | XCoord | YCoord | Other Dest. Columns |
--------------------------------------------------------------------------
455 | 332 | 24 | 4 | ... |
892 | 332 | 653 | 32 | |
112 | 211 | 5 | 122 | |
Idea #2
Use three tables, where Coordinates
are stored in their own table, but everything else is the same:
CoordinateID | XCoord | YCoord |
------------------------------------
1 | 33 | 4 |
2 | 213 | 342 |
3 | 365 | 12 |
4 | 24 | 4 |
5 | 653 | 32 |
6 | 5 | 122 |
SourceID | CoordinateID | Other Source Columns |
---------------------------------------------------
332 | 4 | ... |
423 | 5 | |
211 | 6 | |
DestinationID | SourceCoordID | CoordinateID | Other Dest. Columns |
------------------------------------------------------------------------
455 | 332 | 1 | ... |
892 | 332 | 2 | |
112 | 332 | 3 | |
The differences I can think of are these:
- The second one will perform worse: in order to get the (x, y) coordinates of a source or destination, the source or destination table must be joined with the coordinate table.
- The second one is more maintainable: only one table needs changing if, say, a timestamp needs to be stored with every (x, y) pair.
Are those differences correct?
(I'm happy to remove the next question if it's too open ended or off-topic.) Are there other significant differences?
Best Answer
JOIN
operations are not as expensive as you might expect. Many times the benefit of normalization will far outweigh the "costs" of joining the tables, specifically if you have "common" coordinates that are used by more than one destination/source.I would likely model this using two tables, one for coordinates, and one to describe the points (sources and destinations).
For instance:
Results:
This design allows any point to serve as the destination for any other source point, effectively allowing you to create a map of directions from one point to any other.