SQL Server Database Design – When to Duplicate Data for Querying

database-designsql server

I have the following table design.

an Event table, with helper tables to hold the event meta data.

each event row has an event type column (regular_event, recurring_route, timed_messages, and more.)

each event type has its own tables to define different things regarding the event – including users. The thing is that it might be referenced directly by a many to many table (users_regular_events) or by a third table or forth table (events -> recurring_routes -> recurring_routes_stations -> recurring_stations -> recurring_stations_users -> users).

would it make sense to create a users_events table that duplicates relations between certain events? because, currently if I want to show each user his events I'll have to join multiple tables. Maybe I need to rethink my entire design?

Best Answer

I agree with Aaron Bertrand that pre-optimization is not a good idea. Relational database management systems are built for joining data. Don't presume a performance problem, observe it, preferably through thorough load testing prior to moving to production, then deal with it if necessary.

As to your question about when does it make sense to duplicate data for querying, the classic scenario is in a data warehouse. Ideally, you want the data to be static, in other words, read only. This can be the case for historical transactional data, for example. In a data warehouse the data is written as close to once as possible and is read many, many times. If this is your scenario, then denormalization for reporting may be a reasonable design choice.

However, any time you introduce redundancy in your data you open yourself to the risk of loss of data quality. This is what the Normal Forms are designed to prevent. Redundancy introduces risk that you need to manage.

As long as you go in with your eyes open, you will be OK.