Database Design – Identifying Design Flaws in SQL/Relational Databases

database-design

I've been working with a a relational model for a work project over the last few months, and I've gotten to the point where I'm wondering if I've made a mistake in my model's design or implemented things incorrectly. I can't go into specifics about the model because it's for work, but I'll try my best to elaborate.

The database contains data from three separate sources that are time sensitive. I.e., all of the date starts out with a DATE or DATETIME of some form. I've written a process to aggregate that data to unique values over date ranges and then run those unique values through an optimization engine in order to gain insights about an "aggregated optimal view" of the data. The process seems to work fine when there are relatively small amount of data (think ~170,000 records per date range that I select) but doesn't seem to be scaling upwards at all. Some of the queries that I've written are now taking much longer than they used to, and I think it's because of the way that I've designed the system.

Here are some of the "symptoms" of my database:

I use indexes in my tables, but haven't defined foreign key relationships
- I was under the impression that have well defined indexes for each of my JOINs was sufficient
There are some queries where I have as many as 4 or 5 nested subqueries
Some of my queries JOIN 6 or 7 tables together in order to get data
A couple of queries that I have are >300 LOC

I could add a few more, but I think that this gets my idea across.

My question is: when does one know that their database isn't modeled properly?

Is it when your queries stop scaling? When you have to write long queries to get the data that you're looking for?

I can give more information about the stack and machine that I'm using if it helps.

Thanks

Best Answer

Modeling and performance are related but not quite the same thing. Performance and scalability will have a lot to do with what DBMS you are actually using.

Queries which seem to be slow and difficult to execute against MySQL might fly when run against Postgres for instance. This has everything to do with how intelligent the query planner is. Older versions of MySQL for instance only seem to know how to perform one kind of join: Nested-Loop. This will be fast for small amounts of data and simple queries, but can degrade quickly for complex queries.

You may not have a design flaw. You may have simply hit the performance wall with your current DBMS. Which database are you using?

In my opinion, an error in the model exists when you have insert, update, or delete anomalies in the data. Imagine you store a customer name on each invoice and the customer changes their name. To perform this update should not require updating the history of every order the customer has ever made. If it does, you've got a problem with your model.

Related Solutions

Mysql – Am I wrong in table design or wrong in selected index when made the table

It is difficult to say without seeing an execution plan, but I would take a look at your joins to the geography hierarchy. The geography tables (prov/city/district) have compound keys and all of the necessary foreign keys are already in your peoples table. You are currently joining partly directly and partly through the compound keys. This is a very unconventional approach. You should join on full keys not on partial compound keys. However, in you case you could probably simplify further. Instead of joining up the hierarchy why not just join directly from the bottom level to each piece of the hierarchy as in a star schema.

Try this join clause instead:

FROM peoples E
  JOIN test_prov B ON  E.id_prov = B.id
  JOIN test_city C ON  E.id_city = C.id 
                   and E.id_prov = C.id_prov
  JOIN test_district D ON  E.id_district = D.id 
                       and E.id_city = D.id_city
                       and E.id_prov = D.id_prov
  JOIN test_town A ON  E.id_town = A.id 
                   and E.id_distict = A.id_district 
                   and E.id_city = A.id_city
                   and E.id_prov = A.id_prov 
WHERE E.stat_valid=1 
  AND E.mark_as_trash=0

Note also that I've taken stat_valid and mark_as_trash out of the joins and put them in a where clause. Don't include non-key columns in your joins, it's bad form. Note too that these columns are not indexed, so you will be potentially be causing a table scan with these. I suspect that even if they were indexed they wouldn't be selective enough and you might end up with a table scan anyway.

How to model medium.com paragraph comment

You can insert a comment marker in the text itself. This marker is not displayed. As text is being changed, the marker stay embedded in the text itself and moves around as text is being edited. If the paragraph is removed then the comment can be orphaned and garbage collected later. Of course, this has nothing to do with DBs, is just pure text processing.

If you want a DB centric approach then the paragraphs are assigned primary keys during creation and the comments reference this primary key. As PK, of course, they never change. Order of paragraphs within text can be persisted as a relation previous-current-next (implemented as a separate table, or as next_id and prev_id fields. You can also throw in versioning (history). I'm not a editable text storage expert so I don't know what is the state of the art.

Your problem, as you state it, is not a problem of text processing but a problem arrise from the use of a natural primary key (paragraph position or paragraph hash). Of course, we all know that natural keys are bad because they change. In your example they change very frequently. Using surrogate keys solve the problem of PK volatility. the rest is just a modeling exercise on storing the relations (order of paragraphs in text, paragraph changes as text evolves etc). I'm letting as an exercise to you the problem of paragraph split during edit (which shard inherits which comments) and the problem of paragraph merge.

Best Answer

Related Solutions

Mysql – Am I wrong in table design or wrong in selected index when made the table

How to model medium.com paragraph comment

Related Question