Database Design – Identifying Design Flaws in SQL/Relational Databases

database-design

I've been working with a a relational model for a work project over the last few months, and I've gotten to the point where I'm wondering if I've made a mistake in my model's design or implemented things incorrectly. I can't go into specifics about the model because it's for work, but I'll try my best to elaborate.

The database contains data from three separate sources that are time sensitive. I.e., all of the date starts out with a DATE or DATETIME of some form. I've written a process to aggregate that data to unique values over date ranges and then run those unique values through an optimization engine in order to gain insights about an "aggregated optimal view" of the data. The process seems to work fine when there are relatively small amount of data (think ~170,000 records per date range that I select) but doesn't seem to be scaling upwards at all. Some of the queries that I've written are now taking much longer than they used to, and I think it's because of the way that I've designed the system.

Here are some of the "symptoms" of my database:

  • I use indexes in my tables, but haven't defined foreign key relationships
    • I was under the impression that have well defined indexes for each of my JOINs was sufficient
  • There are some queries where I have as many as 4 or 5 nested subqueries
  • Some of my queries JOIN 6 or 7 tables together in order to get data
  • A couple of queries that I have are >300 LOC

I could add a few more, but I think that this gets my idea across.

My question is: when does one know that their database isn't modeled properly?

Is it when your queries stop scaling? When you have to write long queries to get the data that you're looking for?

I can give more information about the stack and machine that I'm using if it helps.

Thanks

Best Answer

Modeling and performance are related but not quite the same thing. Performance and scalability will have a lot to do with what DBMS you are actually using.

Queries which seem to be slow and difficult to execute against MySQL might fly when run against Postgres for instance. This has everything to do with how intelligent the query planner is. Older versions of MySQL for instance only seem to know how to perform one kind of join: Nested-Loop. This will be fast for small amounts of data and simple queries, but can degrade quickly for complex queries.

You may not have a design flaw. You may have simply hit the performance wall with your current DBMS. Which database are you using?

In my opinion, an error in the model exists when you have insert, update, or delete anomalies in the data. Imagine you store a customer name on each invoice and the customer changes their name. To perform this update should not require updating the history of every order the customer has ever made. If it does, you've got a problem with your model.