I would go with number 1. Having a separate table for each vendor or each product type will be a nightmare when trying to run reports that need data from multiple product tables. You don't want to have to create a new table each time you add a vendor or product type.
I understand your inclination coming from a spreadsheet, but with the ability to query the records easily, separating the products into multiple tables is not the way to go.
I am inclined to say that there is a downside because products of one category will have many NULLs in attributes meant for other categories, but I'm not sure if this is actually a negative.
To prevent this, attributes should be in a separate table.
For example, a partial table definition may be as follows:
products
-- id
-- name
product_attributes
--product_id
--name
--value
attributes.product_id is a foreign key of products.id.
To further normalize it you could define the attributes in a separate table and have an attribute_id field instead of attribute_name:
attributes
--id
--name
product_attributes
--attribute_id (foreign key of attributes.id)
--value
(edited for spelling)
If you never or rarely need to access more than one type at a time, then consider splitting. However, personally I wouldn't do so. I've been involved in a number of projects where this has been done, and result in a lot of additional complexity with little benefit.
The companies table appears to be a good choice for the common data. Migrating data when a company changes type will increase complexity. To enable this to work well you will need a table to generate ids from, and rules to ensure the same id doesn't appear in more than one sub-table.
I would consider splitting type specific columns into separate tables. This will require different queries by type. Again this may make the application more complex. As there are only a few smallish columns, keeping them in the main table likely has relatively little overhead and simplify the overall design. If you have many or larger fields which are type specific and only of use when the type is known, then consider splitting the sub-type data into another table. It can share the same key as the relationship will be an optional identity relationship.
Another case where I would consider splitting a record is where there is a mix of tombstone data and highly transactional data. I've seen it work well with parts, and inventory data.
Having worked in a company where suppliers were also clients, revisiting how suppliers and clients are differentiated might be a worthwhile exercise. I would consider splitting suppliers into a separate table based on the following criteria.
- Systems dealing with suppliers are separate from those dealing with customers.
- There is little prospect of reusing access routines between the systems.
- Volumetric data indicates that a least one system could get substantially better performance from splitting the tables.
- Security/access requirements for the two type of companies are significantly different.
Reviewing what happens to prospect data when a prospect becomes a client may be a worthwhile exercise. If it is retained, it may have value in determining how clients are being generated. It will also make converting clients back to prospects after keying errors simpler.
Partitioning a table like a customer table is likely to be of little use. Hot clients will likely be cached at the database or o/s block buffer level. If you have the ID it is pretty well a direct look-up.
Transactional data is more suitable for partitioning. Order details can age out in hours depending on order processing times. Customer data will likely take years to age out. I don't see any data in the table which could be used to effectively partition the data base on 'hotness'. I have generated an archiving system for customers and their data. Determining the customers which could be archived was not trivial.
Best Answer
Business rules
Based on (a) some deliberations we had via comments and (b) the content of the question, we have defined the following characteristics of your business environment:
So, among other points, there is a many-to-many (M:N) association or relationship involving the entity types called Supplier and Product. This is a very basic —but, at the same time, quite powerful when modeled accurately— construct usually employed to shape some components of a database conceptual model.
IDEF1X model
Consequently, I created an expository IDEF1X1 diagram that consolidates the previously formulated business rules in a single graphical device, which is shown in Figure 1:
As you can see,
ProductSupplier
is the associative entity that portrays the M:N association that connects theProduct
andOrganization
.Each
ProductSupplier
is uniquely identified by the combination of values of two distinct properties or attributes, i.e.,SupplierId
(a role name I have assigned toOrganization.OrganizationId
in order to express the meaning that such attribute holds when it migrates2 to theProductSupplier
entity type) andProductId
,so they are depicted as a composite PRIMARY KEY. These properties are FOREIGN KEY references to
Organization
andProduct
, respectively.With this layout, I have arranged the properties that pertain to a each entity type in their appropriate place:
The properties that arise regarding the association between
Product
andOrganization
(let us say, the property that denotes the point in time when a givenOrganization
started suppliying a certainProduct
) are included exclusively in the associative entity type.The properties that concern only to the
Organization
entity type are placed in the corresponding box, an so are the properties that belong to theProduct
entity type.Expository logical SQL-DDL structure
I wrote the DDL statements that follow based on the conceptual definitions discussed above, so that you can get a form a better idea about how to design this part of your database:
In this logical structure, as demonstrated:
All these factors help to prevent ambiguities that would have come about if multiple entity types are represented by way of a single table.
Derivable information
Successively, you can derive data from the three different base tables —e.g., by dint of JOINs— in order to take advantage of the logical structure and produce new tables that include information like shown in the pictures you included in the question.
Redundancy
In a database that is designed following the principles of the relational model, duplication of values contained in columns is not only acceptable but expected, while duplicate rows are strictly forbidden.
As for the logical structure expounded above, all the tables are constrained declaratively so as to prevent the INSERTion of duplicate rows, therefore the avoidance of this kind of harmful redundancy is well addressed.
Normalization
Relational normalization —a subject worth mentioning since you added the normalization tag— is a logical-level procedure whose purposes are
to decompose via first normal form the table columns whose types accept non-atomic values so that data manipulation and constriction are much easier to cope with by the data sublanguage of use (e.g., SQL), and
to get rid of undesirable dependencies among the columns of a table by virtue of the successive normal forms to avoid update anomalies that introduce other kinds of dangerous redundancies.
Of course, one has to take into account the meaning carried by the table(s) and column(s) at issue.
I suggest that you take the time to test the tables I present in the previous logical structure so that you can determine if it meets each normal form.
Endotes
1 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was defined as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
2 IDEF1X defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.