Database Design – Storing Stadium Capacity for Current and Past Events

database-designsql serversql-server-2016

I am trying to design a database as a personal/passion development project to track current and historical football game information. I am having trouble with table designs that will allow me to store data that might change over a period of time. For instance, stadium capacity may grow or shrink based on a renovation of the stadium.

In this simple diagram, I have three tables:

Database Design

But with this design, I will only store the current capacity of the stadium so if Stadium A increases capacity by 5,000 in 2018 – all events before would reflect the current capacity and not the capacity when the event occurred.

My initial thought it that I would need an additional table – maybe StadiumCapacity – that would store the stadium ID, a capacity number, and the years that stadium held that capacity and possible a MostRecent column.

Does anyone have experience with designing a database for sports related information and would this be the most efficient way of tracking this information?

Best Answer

Three options:

Option One

Create a history table storing all data that can change about the stadium. This could also include its name, though its location is probably set. Lets call this "StadiumHistory". Your Schedule table would then use this as its FK. The StadiumHistory table would have a FK to the Stadium table which uniquely identifies the stadium and its location.

Option Two

Ask yourself "what is my eventual goal?". The answer is probably "to create a system which can be queried really quickly, where calculations including aggregation functions of teams/goals/etc can be performed very quickly to enable me to identify likely results or predict future outcomes based on a variety of factors, or to examine trends etc from the past".

If this is your eventual goal, you may want to keep your 3NF database above plus have it regularly export to another database, one which has dimension normalization - a data warehouse. Your stadium then becomes a Dimension table, probably expanded to "Location" with your results a Fact table. This Dimension table would contain values (which would often repeat) for all of the fields you have specified in Stadium. In fact, you could expand the "Location" table then to include weather information for the day, pitch condition, crowd attendance etc.

This approach has many advantages long term, but requires a new database (development) and an appropriate ETL process to load it. My experience tells me, you will eventually be building this if your database records historical information and you want to begin aggregating information for trend analysis. If you find yourself writing history tables for everything, you will find the interfaces for this difficult to maintain in your current database.

Option Three

Forget about anything that changes. Is it really important that you record the capacity on any day? If not, don't worry about the history of recording stadium capacity. Your database then only contains the most up to date stadium information.

This is the simplest option - it means you don't have to change your database design.