How to design tables – dabbler question

database-designperformance

I can understand that this is a basic question, but it's not about knowledge (I think) it's about practical experience.

I came from another field so, be kind, please.

I have my ORDER
Every ORDER has ITEMS
Every ITEM has his VARIANTS
Every VARIANT has his QUANTITY

When I'm required to store the content of an order, my first instinct is to create an array with items, variants, and quantities and then convert it to a string and finally store it as a field of the ORDERS table.

I can smell there is something wrong.
The ITEMS in the order can be a table, and so can VARIANTS and QUANTITY. And this seems to me as a tidier way to solve the problem.

However, when I have to rebuild an ORDER with the first approach it's very easy and I just have to convert the string to the original 3D array, while with the second I've to look at 3 different tables, and this is not something I feel comfortable with, mainly because it seems like a lot of work to do, and if there are only a few requests, it's easy, but if I have 1000 users at the time I fear that the DBMS can't cope with the workload.

I've tried some basic queries with MySQL and all my fears seem excessive, but I don't know how things work in a real work application.

I'd be glad to know how a pro would solve this problem. In production, would you use strange concatenations or separate tables (and the DB remains always stable and responsive)?

Best Answer

So Akina gave you some MySQL code to create the tables themselves, which is helpful and you should look at to get an idea of the structure. But here is some information to give you context and hopefully ease your concerns by giving you perspective.

While your current approach of storing all the information in one object may seem easier, it has a lot of cons to it as well, which is what the purpose of a relational database management system solves (RDBMS).

Firstly, the process is simpler and more importantly performant, when you need to ask complex questions of your data with a RDBMS as opposed to asking those same questions of your single monolithic object. Examples of those questions are: "How many Orders were sold of Item A with Variant 2 and Item C with Variant 7 in the year 2019?", "What were the 3 least selling Variants and their Items in the last rolling 6 months?", "What Variant and its Item has the average number of Orders in the summer?", "What was the total profit made from Orders on all Variants of Item D plus only Variants 1 and 2 of Item E, last year?".

Any of the above questions would likely invoke you writing code that needs to scan the entire dataset of all instances of your monolithic object, but in a RDBMS, where the data points are decoupled by their relations, you can write code that only seeks out the exact data points needed to answer the above questions (when architected and indexes properly).

Additionally your fears of "1,000 users" concurrently using the RDBMS is actually not something to be afraid of. Actually, you run into more concurrency issues with a single monolithic object as you currently are doing as opposed to using a database system. This is because when you need to write all that data to only one object, the total time that object is locked for is longer than in a database system where the time to write the data to tables is split (and can be parallelized) so that any given table of data is individually locked for a much shorter amount of time, relatively.

The beauty of decoupling the data by it's relations really shines in both performance and management of your data, in a database system. I've worked with semi-big data before myself (tables with 10s of billions of rows), and trying to manage that in monolithic objects is just not sustainable and definitely not performant for multiple reasons. One may even decouple the tables you mentioned above further to improve performance and reduce data redundancy, such as having an Orders table and an OrderLines table, assuming you can sell more than one Item on a single Order.

While you may think reconstructing an original Order is more difficult because the data lives in three tables instead of one object, it's actually very simple:

SELECT TheFieldsYouWant 
FROM Orders 
INNER JOIN Items ON Orders.ItemKeyField = Items.KeyField 
INNER JOIN Variants ON Items.VariantKeyField = Variant.KeyField

Literally four lines of code gets you the original Order objects as you need them (likely less lines of code than manipulating string arrays ;). It's just a matter of getting used to the syntax and relational logic. But again the three takeaway points of benefits of using a database system are:

  1. Flexibility - by decoupling your data points so you can query them however you need in an efficient manner
  2. Performance - by reducing data redundancy, improving concurrency (both write and read concurrency), and with indexing assisting with only seeking the exact data points you need instead of scanning the entire dataset
  3. Simpler code - as demonstrated above, but again just takes getting used to when it's a new concept