Mysql – UPDATED: Relational (or NoSQL) Database Design

database-designMySQLrelational-theoryschema

(Updated this question with more specifics. Decided on a Relational Database, and would love some comments on its design)

This database holds assembly instructions data for shipbuilding, later use in a Augmented Reality Android application. The application asks the server (Node.JS) for the necessary files.

Simply put: Work Packages contain several Parts that must be mounted, each of these parts has the Files necessary needed by the application.

Database Structure

  • Work Packages depend on other Work Packages (WD_Dependencies), in the figure above, S1 depends on 1 and 2;
  • Workers can be assigned a specific Work Package, if not, the return Work Package is a combination of matches between Type and Department.

Update:

(Open in new tab for better view)

Database Design

The database is "divided" between the blueprint of a project, and the specific execution of that blueprint. This also allows for multiple executions of the same blueprint.

Questions:

  1. Is this a good design? Dividing Blueprint and Projects?
  2. Efficient SQL Query to recursive lookup all the dependencies of a Work Package?
  3. Do you see any problems with the relationships or any redundancy?
  4. Any comments or help are welcomed.

I understand this could be a broad/opinion question, but I have no experience with database of this size, and would appreciate the help.

Best Answer

Which of the two options is the best?

The relational design is clearly superior to the hierarchical design used in the NoSQL example. A database schema designed using relational principles does not favor one access path over another. Each table represents a real world entity type and through the use of relational algebra queries of arbitrary complexity can be constructed to answer not only current questions of the data but any future questions you think of that haven't yet been conceived. A hierarchical schema imposes a single structure upon the data - in this case work packages contain other work packages, the worker, and the parts, and the files. If you want to ask questions of the data not aligned with this single view it will be much more difficult if not impossible. Furthermore, the relational model includes a data integrity component to ensure data is consistent with respect to rules declared to it approximating the real world, whereas systems, like a NoSQL system, using a hierarchical or network model do not provide these.

Should a Work Package reference the above Work Packages, or the contrary is better?

The best way to implement this is to create a new table that represents the links between the parent and dependent work packages while leaving the work package table to represent the work packages themselves regardless of dependency. This design does not mix work packages dependent on other work packages with work packages that are not dependent on others. A key advantage of the relational approach is that a single structure with a single set of operators - relational tables and relational algebra - can easily represent a hierarchy or a network. Network DBMS's require two types - a node and a link - with two distinct sets of operators, and thus twice the complexity. A hierarchical DBMS can't represent a network correctly at all. This is a key reason why the NoSQL systems available today can only address a narrow set of use cases. Fabian Pascal's book Practical Issues in Database Management, Chapter 7, gives an excellent overview of data hierarchies using SQL and I highly recommend it.

There will be a substantial of reads, however the writes will be considerable less.

Another benefit of the relational approach is that the performance of the system can be managed to a great extent at the physical level without impacting the logical database schema used by applications to work with the data. With the relational approach you can focus on the correct logical schema first and then in most cases let the DBA implement a correct physical design and infrastructure to best support the read heavy workload. This is not the case in navigational systems where the programmer must specify the access paths directly in programs.