Database Design – Advice on Performance and Subtypes

database-designperformancesubtypes

  • context :

    • Teachers create problems to their Students
    • A Problem have Numerals(Questions)
    • A Numeral(Question) have Steps to guide the student to the answer
    • Choice represent the type of mathematical abstraction that fits
      better

      • Solve: (unknown variable, starting expression)
      • Simplify: (reference, target)
      • Plot: (slope,b)
  • Base Models and Relations:

    • Problems: one to Many Numerals(Questions)
    • Numerals: zero to Many Steps
    • Steps:
  • Requirements:

    • Numerals and Steps now have a choice which where described above, currently just 3 choices but it's expected to grow. only one and at least one choice
    • Depending on the choice the numeral or step will add new attributes
    • for now choices do not have any relations between them
    • choice in Numeral could be different to the choice in Step and vice versa (see Example)
  • Example: did my best to make a close abstraction to show the idea, I've made a JSON Structure, almost identical to the structure used in my front end. Maybe the example is a little bit dumb but it gives a taste.

  • My Attempts:
    For simplicity , Models are shown with the minimum attributes

    • 1st:
      1st design

    -My concerning with this approach is that the amount of FK in the Numeral and Step model. Since only one choice is possible, i would have choices-1 amount of useless columns per record

    • 2nd:
      2nd design

    -I'm programmatically checking for the choice field in the Numeral or the Step, and then using the proper table to fill the data, this second attempt started as a solution to fix the empty columns in the 1st approach

    -My concerning with this approach is that the amount of tables will grown 2*n, since for each choice i need to create the corresponding table for step or numeral

I'm open to suggestions or ideas about this attempts.

Edit:
Thanks to @J.D for suggesting this idea. i've modified a bit because i need to keep track of the choices and their fields somewhere somewhere. I think this solution is easier to mantain and removes the need to add tables in case of new choices

  • 3rd:
    3rd design

Best Answer

As Colin touched on, it's hard to follow what schema problem you're trying to solve with just generalities. Unfortunately the devil's in the details with this kind of thing, since one detail could drastically change the best approach.

Looking at your database diagrams at a high level the tables A, B, and C in the first diagram look like the same entity in regards to their purpose, but varies in value of their columns, most likely?...same when comparing tables ANumeral, BNumeral, CNumeral to each other, or when comparing AStep, BStep, and CStep to each other, in your second diagram. The question is why couldn't each of these sets of tables be just one table each (or possibly even one table altogether)?...e.g. you likely could have just a single StepAttributes table that Step joins to.

I did notice in your example the only difference between two tables of the same set are the attribute data types. If that's your reasoning for making multiple tables, instead just structure your single attribute table (e.g. StepAttribute) to account for that which can be done two ways.

The first way by having three columns, one being the AttributeName, the next being the AttributeValue (which will be a string type field, like VARCHAR, to accommodate any value), and the third being the AttributeDataType which stores the native data type of the attribute so you know what to appropriately cast it to when you consume it.

The second way is you can have a denormalized attribute table that has a nullable column of each data type with generic column names, such as DateAttributeValue, IntAttributeValue, VarCharAtrributeValue etc, and a column to specify which data type the value is / column it's stored in. Some systems even design such a table like DateAttributeValue1, DateAttributeValue2, DateAttributeValue3, IntAttributeValue1, IntAttributeValue2, etc so they can support storing multiple attributes within the same row.


Thanks for updating your question with more details. So at the least it looks like each sub-table to Numeral vs Step are a bit redundant in your second design, and I understand why you normalized them that way based on your first design. You could actually combine the best of both worlds though and just have a single Plot, Simplify, and Solve table with the foreign key reference field in those tables going back to your Numeral and Step tables by storing both IDs in that column and calling it something like NuneralOrStepId. Then you'd need an additional field on those tables called like IsForNumeral or ParentType (come up with a better name than mine lol) which would distinguish which base table it joins back to. Then you have no waste in fields and reduce your redundancy as well.

It depends on how many of these sub-tables you anticipate having to generate and manage (right now 3 is completely manageable) but if you think that's going to grow into over 10 tables or so, I'd still consider my first suggestion with one generic table of AttributeName / AttributeValue pairs. You can add any additional columns to this structure that make sense as well, such as the ParentType column to distinguish which parent table the Attribute belongs to.