I'm planning to develop a Shopfloor Database System. What are the best database design pattern should i apply to my system. Each of the process have their own form and own field, but most of the field set are similar with each other process, just some of the process may have additional extra field. In which, it was difficult for me to create just one table to keep all of the process data entry.
Below are the sample of my process and the field column that required for each of the process.
Which are the recommend database design should i apply to this? Should I just create one table to keep all those similar column field data and for the rest of the other different column name I just move it to other table with a foreign key?
Best Answer
You have several options:
Create one table with lots of columns
This is what Gerard H. Pille suggested in the comments: create a single table, with all possible columns for all possible processes. Where a process does not have a value for a given column, set the value to NULL.
Pros:
Cons:
One table per process
Here, you would have a different table for each process. These tables would presumably have some columns in common in every table; some that only appear in some of the tables; and some unique to a given table.
Pros:
Cons:
Base process table, with EAV table
Create a base Process table to store the data common to all processes (proc date, time in/out, qty start). Then, create a table to hold all other attributes for each process. This table would have three basic columns:
Pros:
Cons:
WHERE
clause needs to check values of several different attributes stored in EAV tableHybrid approach
Create a base process table with all column common to all processes. Create full or partial process-specific tables, with columns unique to a given process. you could even allow an EAV table for attributes that are always optional, or possibly that are new to the system.
Pros:
Cons:
Other considerations
If you have a large number of fields that are rarely needed for sorting/searching, but just for display in reporting, then storing certain values as JSON (basically, a data structure materialized as a string value) may work well. Some DBMSes have been extended to explicitly handle JSON data, and even to allow it to be indexed.
Also, some databases may allow some sort of table inheritance. I don't know much about this, but if you have a lot of processes that are the same except for one or two unique data fields, this might allow you to set up a series of table types, all built off your base table; this could make maintenance of a "one table per process" solution easier, if a change to the base process table type would essentially flow through all the process-specific tables.
All well and good. However, in practice, the details of your specific application and of your specific data will really drive this.
If most columns are present in most processes, then a single table with a lot of columns may make a lot of sense.
If your processes are frequently in flux, a base table plus an EAV table may be the only sane solution; otherwise, you may find yourself constantly needing to alter your databases structures.
If the number of rows you'll have for certain processes will be much more significant than the rows for other processes, then it may make sense for the more significant processes to have all columns in the base process table, and to use extra tables or an EAV for the processes that will appear less frequently.
Note that different DBMSes will have their own strengths and weaknesses; some may perform better than others for a given set-up.
You also have to take your application into account. If you're already going to have to make a specific form/page for each process (reusing elements where possible, of course), then the idea of one table per process may make very good sense; If, on the other hand, you'll have a central form with the data common to all processes, and will simply list each additional attribute and value at the bottom of the common form, an EAV table would match your structure logically.
Ultimately, what's best for you will come down to the specifics of the problems your application is intended to resolve; the amount of time you expect to spend in maintenance on it; the size of maintenance windows available; the performance needs of your specific situation; and your (or your company's) preferred coding style and approach.
Obviously, we haven't seen all of your processes. If, however, almost all possible columns are used by multiple processes, and if there are no more than two or three non-common columns per process, and if your set of processes is well-defined (so new columns are unlikely to be needed), I would be inclined to use a single
Process
table, and have all the columns in it. Again, note all the caveats in the preceding sentence, and realize that's a personal preference - others might pick a different direction.