Database Design – Best Design for Multiple Forms with Similar and Extra Columns

database-designmysql-5.5

I'm planning to develop a Shopfloor Database System. What are the best database design pattern should i apply to my system. Each of the process have their own form and own field, but most of the field set are similar with each other process, just some of the process may have additional extra field. In which, it was difficult for me to create just one table to keep all of the process data entry.

Below are the sample of my process and the field column that required for each of the process.enter image description here

Which are the recommend database design should i apply to this? Should I just create one table to keep all those similar column field data and for the rest of the other different column name I just move it to other table with a foreign key?

Best Answer

You have several options:

Create one table with lots of columns

This is what Gerard H. Pille suggested in the comments: create a single table, with all possible columns for all possible processes. Where a process does not have a value for a given column, set the value to NULL.

Pros:

  • Relatively easy to query and understand

Cons:

  • Meaning of NULL in a column becomes questionable: Is there no defined value for the column for this process, or does this column not apply to this process.
  • Table may require update with new processes, or changes to existing processes.
  • Many DBMSes have a maximum number of columns per table; if you have enough processes and unique columns per process, you could hit that limit.
  • More difficult to enforce referential integrity at the DB level, especially if certain columns should not be NULL for certain processes, but aren't used in every process.

One table per process

Here, you would have a different table for each process. These tables would presumably have some columns in common in every table; some that only appear in some of the tables; and some unique to a given table.

Pros:

  • Meaning of NULL in any column is clear.
  • Enforcing referential integrity is relatively easy.
  • Less likely to have problems with hitting maximum number of columns.
  • Easy to query within a given process.

Cons:

  • Have to create a new table for every new process.
  • Changes that impact all processes may require the same basic updates to all process tables.
  • Somewhat painful to query data from all processes.
  • Harder to maintain a unique global process ID, without a central table to work off.

Base process table, with EAV table

Create a base Process table to store the data common to all processes (proc date, time in/out, qty start). Then, create a table to hold all other attributes for each process. This table would have three basic columns:

  • the ID of the base process record (a foreign key pointing to the base Process table)
  • The name or ID of the attribute being stored (e.g. mc#, yld, epotek batch #)
  • the actual value for the given attribute

Pros:

  • Only store attributes that belong to the base Process table's process type
  • Changes to base process confined to one table; changes to other processes require no DB structural changes
  • No issue with hitting maximum number of columns.

Cons:

  • requires more disk space to store (in most cases)
  • More difficult to query, especially if WHERE clause needs to check values of several different attributes stored in EAV table
  • More difficult to enforce data integrity at the DB level (to ensure that EAV attributes are set for every base process row of a given process type, you basically need a metadata table that defines which attributes are required and which are optional, and you have to check that after each insert/update of a relevant base Process row is complete, or some similar process). If you do want to allow multiple entries for certain attributes, it becomes even more difficult to ensure your distributed Process row is valid.

Hybrid approach

Create a base process table with all column common to all processes. Create full or partial process-specific tables, with columns unique to a given process. you could even allow an EAV table for attributes that are always optional, or possibly that are new to the system.

Pros:

  • This can help prevent problems with the number of columns in a table.
  • Changes to the base process data happen in one place.
  • Maintaining data integrity should be easier.
  • Most querying is relatively simple.

Cons:

  • Have to create a new table for every new process.
  • Changes that move columns that were previously present in some process-specific tables to the base process table require a fair amount of maintenance work.

Other considerations

If you have a large number of fields that are rarely needed for sorting/searching, but just for display in reporting, then storing certain values as JSON (basically, a data structure materialized as a string value) may work well. Some DBMSes have been extended to explicitly handle JSON data, and even to allow it to be indexed.

Also, some databases may allow some sort of table inheritance. I don't know much about this, but if you have a lot of processes that are the same except for one or two unique data fields, this might allow you to set up a series of table types, all built off your base table; this could make maintenance of a "one table per process" solution easier, if a change to the base process table type would essentially flow through all the process-specific tables.


All well and good. However, in practice, the details of your specific application and of your specific data will really drive this.

If most columns are present in most processes, then a single table with a lot of columns may make a lot of sense.

If your processes are frequently in flux, a base table plus an EAV table may be the only sane solution; otherwise, you may find yourself constantly needing to alter your databases structures.

If the number of rows you'll have for certain processes will be much more significant than the rows for other processes, then it may make sense for the more significant processes to have all columns in the base process table, and to use extra tables or an EAV for the processes that will appear less frequently.

Note that different DBMSes will have their own strengths and weaknesses; some may perform better than others for a given set-up.

You also have to take your application into account. If you're already going to have to make a specific form/page for each process (reusing elements where possible, of course), then the idea of one table per process may make very good sense; If, on the other hand, you'll have a central form with the data common to all processes, and will simply list each additional attribute and value at the bottom of the common form, an EAV table would match your structure logically.

Ultimately, what's best for you will come down to the specifics of the problems your application is intended to resolve; the amount of time you expect to spend in maintenance on it; the size of maintenance windows available; the performance needs of your specific situation; and your (or your company's) preferred coding style and approach.

Obviously, we haven't seen all of your processes. If, however, almost all possible columns are used by multiple processes, and if there are no more than two or three non-common columns per process, and if your set of processes is well-defined (so new columns are unlikely to be needed), I would be inclined to use a single Process table, and have all the columns in it. Again, note all the caveats in the preceding sentence, and realize that's a personal preference - others might pick a different direction.