Database Design – Best Design for Multiple Forms with Similar and Extra Columns

database-designmysql-5.5

I'm planning to develop a Shopfloor Database System. What are the best database design pattern should i apply to my system. Each of the process have their own form and own field, but most of the field set are similar with each other process, just some of the process may have additional extra field. In which, it was difficult for me to create just one table to keep all of the process data entry.

Below are the sample of my process and the field column that required for each of the process.

Which are the recommend database design should i apply to this? Should I just create one table to keep all those similar column field data and for the rest of the other different column name I just move it to other table with a foreign key?

Best Answer

You have several options:

Create one table with lots of columns

This is what Gerard H. Pille suggested in the comments: create a single table, with all possible columns for all possible processes. Where a process does not have a value for a given column, set the value to NULL.

Pros:

Relatively easy to query and understand

Cons:

Meaning of NULL in a column becomes questionable: Is there no defined value for the column for this process, or does this column not apply to this process.
Table may require update with new processes, or changes to existing processes.
Many DBMSes have a maximum number of columns per table; if you have enough processes and unique columns per process, you could hit that limit.
More difficult to enforce referential integrity at the DB level, especially if certain columns should not be NULL for certain processes, but aren't used in every process.

One table per process

Here, you would have a different table for each process. These tables would presumably have some columns in common in every table; some that only appear in some of the tables; and some unique to a given table.

Pros:

Meaning of NULL in any column is clear.
Enforcing referential integrity is relatively easy.
Less likely to have problems with hitting maximum number of columns.
Easy to query within a given process.

Cons:

Have to create a new table for every new process.
Changes that impact all processes may require the same basic updates to all process tables.
Somewhat painful to query data from all processes.
Harder to maintain a unique global process ID, without a central table to work off.

Base process table, with EAV table

Create a base Process table to store the data common to all processes (proc date, time in/out, qty start). Then, create a table to hold all other attributes for each process. This table would have three basic columns:

the ID of the base process record (a foreign key pointing to the base Process table)
The name or ID of the attribute being stored (e.g. mc#, yld, epotek batch #)
the actual value for the given attribute

Pros:

Only store attributes that belong to the base Process table's process type
Changes to base process confined to one table; changes to other processes require no DB structural changes
No issue with hitting maximum number of columns.

Cons:

requires more disk space to store (in most cases)
More difficult to query, especially if WHERE clause needs to check values of several different attributes stored in EAV table
More difficult to enforce data integrity at the DB level (to ensure that EAV attributes are set for every base process row of a given process type, you basically need a metadata table that defines which attributes are required and which are optional, and you have to check that after each insert/update of a relevant base Process row is complete, or some similar process). If you do want to allow multiple entries for certain attributes, it becomes even more difficult to ensure your distributed Process row is valid.

Hybrid approach

Create a base process table with all column common to all processes. Create full or partial process-specific tables, with columns unique to a given process. you could even allow an EAV table for attributes that are always optional, or possibly that are new to the system.

Pros:

This can help prevent problems with the number of columns in a table.
Changes to the base process data happen in one place.
Maintaining data integrity should be easier.
Most querying is relatively simple.

Cons:

Have to create a new table for every new process.
Changes that move columns that were previously present in some process-specific tables to the base process table require a fair amount of maintenance work.

Other considerations

If you have a large number of fields that are rarely needed for sorting/searching, but just for display in reporting, then storing certain values as JSON (basically, a data structure materialized as a string value) may work well. Some DBMSes have been extended to explicitly handle JSON data, and even to allow it to be indexed.

Also, some databases may allow some sort of table inheritance. I don't know much about this, but if you have a lot of processes that are the same except for one or two unique data fields, this might allow you to set up a series of table types, all built off your base table; this could make maintenance of a "one table per process" solution easier, if a change to the base process table type would essentially flow through all the process-specific tables.

All well and good. However, in practice, the details of your specific application and of your specific data will really drive this.

If most columns are present in most processes, then a single table with a lot of columns may make a lot of sense.

If your processes are frequently in flux, a base table plus an EAV table may be the only sane solution; otherwise, you may find yourself constantly needing to alter your databases structures.

If the number of rows you'll have for certain processes will be much more significant than the rows for other processes, then it may make sense for the more significant processes to have all columns in the base process table, and to use extra tables or an EAV for the processes that will appear less frequently.

Note that different DBMSes will have their own strengths and weaknesses; some may perform better than others for a given set-up.

You also have to take your application into account. If you're already going to have to make a specific form/page for each process (reusing elements where possible, of course), then the idea of one table per process may make very good sense; If, on the other hand, you'll have a central form with the data common to all processes, and will simply list each additional attribute and value at the bottom of the common form, an EAV table would match your structure logically.

Ultimately, what's best for you will come down to the specifics of the problems your application is intended to resolve; the amount of time you expect to spend in maintenance on it; the size of maintenance windows available; the performance needs of your specific situation; and your (or your company's) preferred coding style and approach.

Obviously, we haven't seen all of your processes. If, however, almost all possible columns are used by multiple processes, and if there are no more than two or three non-common columns per process, and if your set of processes is well-defined (so new columns are unlikely to be needed), I would be inclined to use a single Process table, and have all the columns in it. Again, note all the caveats in the preceding sentence, and realize that's a personal preference - others might pick a different direction.

Related Solutions

The best database design for this situation

A rule of thumb I use, is that it's bad database design to have to alter a table to accomodate a code change or a new feature if it can be planned for in the future.

That said, I would use a variant of Option 1

Approvals
    Id PK
    ApprovalTypeID FK
    Status


ApprovalComments
    ID PK
    ApprovalId FK
    Comment

ApprovalTypes
    ID PK
    Name

Now when adding new types of objects that need approval, you only need to insert a row into ApprovalTypes instead of altering a table to add a column.

The best database design for multi-language data

The primary disadvantage of your first option is that you must create/maintain a similar table structure for n different languages and your application must switch between them. When a new language is added you will have to add table and modify your application. Your second option is preferable of the two. I will address some of your concerns.

If your security relies on the separation of data into different tables, then your security needs work. Sure, separating them would provide some degree of additional security, but if you invest the time you save by not maintaining multiple tables into verifying/increasing your other security measures you will have far greater benefit.
When you query the data you can sort by language.
A WHERE clause on the country would retrieve data only for a particular country.

I recommend you store the primary details including the English names in simpler tables and then each table that has other language equivalents can have a language specific table containing every other language. Something like this:

Languages
ID
Name

Drugs
   ID
   Price
   Name
   Interactions
   Mechanism
   Uses

DrugLanguages
   DrugID
   LanguageID
   Name
   Interactions
   Mechanism
   Uses

Companies
   ID
   Name
   Address
   Activity

CompanyLanguages
   CompanyID
   LanguageID
   Name
   Address
   Activity

You should know that I have never done multilingual design, so my approach may have limitations that I do not fathom.

Best Answer

Create one table with lots of columns

One table per process

Base process table, with EAV table

Hybrid approach

Other considerations

Related Solutions

The best database design for this situation

The best database design for multi-language data

Related Question