How to remove data redundancy in product table if multiple suppliers are there for same product

database-designnormalization

I am designing a database where I have different products and suppliers.
For example, I have a product, "mobile phone", say an Apple 5S. This will be sold through my website from different suppliers like Mapple, Mango.

I can not store the data for every single supplier as it will cause data redundancy. I have been considering the following columns for the products table:

ProductID
SupplierID
ProductName
Value
SKUID
VendorName
and many more…

We can see the Product name is repeated every time which caused data redundancy.

My main concern is how to avoid data redundancy? How to design the Product table?

Added from comments:

There will be a many to many relationship between Products and Suppliers.
A Vendor and a Supplier are the same. It is a small organisation like shop owner who are listed to us.
There will be number of different shop owners and they all may have the same Product. It will increase everyday. Suppose we have 1000 shop owners (Vendors) with us, and each has 100 Products which are the same, then there will be 1000*100 rows for data which is not required.
There is correction in table Vendor and Supplier are the same. About individual or person. They will never be Supplier.

Best Answer

Business rules

Based on (a) some deliberations we had via comments and (b) the content of the question, we have defined the following characteristics of your business environment:

A Product is primarily identified by its Id
A Product is alternately identified by its Name
A Product is alternately identified by its SKU
A Product is offered at a Price
Supplier and Vendor are different names you use to refer to the same particular entity type
An Organization can become a Supplier
An Organization is primarily identified by its Id
An Organization is alternately identified by its Name
A Person cannot become a Supplier
A Product can be offered by zero-one-or-many Organizations playing the Role of Suppliers
An organization, playing the Role of Supplier, supplies one-to-many Products

So, among other points, there is a many-to-many (M:N) association or relationship involving the entity types called Supplier and Product. This is a very basic —but, at the same time, quite powerful when modeled accurately— construct usually employed to shape some components of a database conceptual model.

IDEF1X model

Consequently, I created an expository IDEF1X¹ diagram that consolidates the previously formulated business rules in a single graphical device, which is shown in Figure 1:

Figure 1 - Products and Suppliers Simplified IDEF1X Diagram

As you can see, ProductSupplier is the associative entity that portrays the M:N association that connects the Product and Organization.

Each ProductSupplier is uniquely identified by the combination of values of two distinct properties or attributes, i.e.,

SupplierId (a role name I have assigned to Organization.OrganizationId in order to express the meaning that such attribute holds when it migrates² to the ProductSupplier entity type) and
ProductId,

so they are depicted as a composite PRIMARY KEY. These properties are FOREIGN KEY references to Organization and Product, respectively.

With this layout, I have arranged the properties that pertain to a each entity type in their appropriate place:

The properties that arise regarding the association between Product and Organization (let us say, the property that denotes the point in time when a given Organization started suppliying a certain Product) are included exclusively in the associative entity type.
The properties that concern only to the Organization entity type are placed in the corresponding box, an so are the properties that belong to the Product entity type.

Expository logical SQL-DDL structure

I wrote the DDL statements that follow based on the conceptual definitions discussed above, so that you can get a form a better idea about how to design this part of your database:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on the exact business context characteristics.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions. 

CREATE TABLE Organization (
    OrganizationId  INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    FoundingDate    DATE     NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Organization_PK PRIMARY KEY (OrganizationId),
    CONSTRAINT Organization_AK UNIQUE      (Name) -- ALTERNATE KEY.
);

CREATE TABLE Product (
    ProductId       INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    SKU             CHAR(30) NOT NULL,
    Price           INT      NOT NULL, -- Retains an amount in Cents, but there are other options.
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Product_PK      PRIMARY KEY (ProductId),
    CONSTRAINT Product_AK1     UNIQUE      (Name), -- ALTERNATE KEY.
    CONSTRAINT Product_AK2     UNIQUE      (SKU),  -- ALTERNATE KEY.
    CONSTRAINT PriceIsValid_CK CHECK       (Price > 0)
);

CREATE TABLE ProductSupplier ( -- Represents the conceptual M:N association.
    SupplierId      INT      NOT NULL,
    ProductId       INT      NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT ProductSupplier_PK               PRIMARY KEY (SupplierId, ProductId), -- Composite PRIMARY KEY.
    CONSTRAINT ProductSupplierToOrganization_FK FOREIGN KEY (SupplierId)
        REFERENCES Organization (OrganizationId),
    CONSTRAINT ProductSupplierToProduct_FK      FOREIGN KEY (ProductId)
        REFERENCES Product (ProductId)
);

In this logical structure, as demonstrated:

each table denotes an individual entity type;
each column stands for an property of the respective entity type;
a specific data type is fixed for each column in order to ensure that all the values it contains belong to a particular set (that you will have to adapt to adapt to your organization needs), be it INT, DATETIME, CHAR, etc.; and
multiple constraints are set up (declaratively) in order to guarantee that the assertions in form of rows retained in all the tables comply with the business rules determined in the conceptual model.

All these factors help to prevent ambiguities that would have come about if multiple entity types are represented by way of a single table.

Derivable information

Successively, you can derive data from the three different base tables —e.g., by dint of JOINs— in order to take advantage of the logical structure and produce new tables that include information like shown in the pictures you included in the question.

Redundancy

In a database that is designed following the principles of the relational model, duplication of values contained in columns is not only acceptable but expected, while duplicate rows are strictly forbidden.

As for the logical structure expounded above, all the tables are constrained declaratively so as to prevent the INSERTion of duplicate rows, therefore the avoidance of this kind of harmful redundancy is well addressed.

Normalization

Relational normalization —a subject worth mentioning since you added the normalization tag— is a logical-level procedure whose purposes are

to decompose via first normal form the table columns whose types accept non-atomic values so that data manipulation and constriction are much easier to cope with by the data sublanguage of use (e.g., SQL), and
to get rid of undesirable dependencies among the columns of a table by virtue of the successive normal forms to avoid update anomalies that introduce other kinds of dangerous redundancies.

Of course, one has to take into account the meaning carried by the table(s) and column(s) at issue.

I suggest that you take the time to test the tables I present in the previous logical structure so that you can determine if it meets each normal form.

Endotes

_{¹ Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was defined as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.}

_{² IDEF1X defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.}

Related Solutions

Designing a Products database: One Products table or many separated by vendor

I would go with number 1. Having a separate table for each vendor or each product type will be a nightmare when trying to run reports that need data from multiple product tables. You don't want to have to create a new table each time you add a vendor or product type. I understand your inclination coming from a spreadsheet, but with the ability to query the records easily, separating the products into multiple tables is not the way to go.

I am inclined to say that there is a downside because products of one category will have many NULLs in attributes meant for other categories, but I'm not sure if this is actually a negative.

To prevent this, attributes should be in a separate table.

For example, a partial table definition may be as follows:

products
-- id
-- name

product_attributes 
--product_id
--name
--value

attributes.product_id is a foreign key of products.id.

To further normalize it you could define the attributes in a separate table and have an attribute_id field instead of attribute_name:

    attributes
    --id
    --name

    product_attributes 
    --attribute_id (foreign key of attributes.id)
    --value

(edited for spelling)

Mysql – Splitting a table based on enum value

If you never or rarely need to access more than one type at a time, then consider splitting. However, personally I wouldn't do so. I've been involved in a number of projects where this has been done, and result in a lot of additional complexity with little benefit.

The companies table appears to be a good choice for the common data. Migrating data when a company changes type will increase complexity. To enable this to work well you will need a table to generate ids from, and rules to ensure the same id doesn't appear in more than one sub-table.

I would consider splitting type specific columns into separate tables. This will require different queries by type. Again this may make the application more complex. As there are only a few smallish columns, keeping them in the main table likely has relatively little overhead and simplify the overall design. If you have many or larger fields which are type specific and only of use when the type is known, then consider splitting the sub-type data into another table. It can share the same key as the relationship will be an optional identity relationship.

Another case where I would consider splitting a record is where there is a mix of tombstone data and highly transactional data. I've seen it work well with parts, and inventory data.

Having worked in a company where suppliers were also clients, revisiting how suppliers and clients are differentiated might be a worthwhile exercise. I would consider splitting suppliers into a separate table based on the following criteria.

Systems dealing with suppliers are separate from those dealing with customers.
There is little prospect of reusing access routines between the systems.
Volumetric data indicates that a least one system could get substantially better performance from splitting the tables.
Security/access requirements for the two type of companies are significantly different.

Reviewing what happens to prospect data when a prospect becomes a client may be a worthwhile exercise. If it is retained, it may have value in determining how clients are being generated. It will also make converting clients back to prospects after keying errors simpler.

Partitioning a table like a customer table is likely to be of little use. Hot clients will likely be cached at the database or o/s block buffer level. If you have the ID it is pretty well a direct look-up.

Transactional data is more suitable for partitioning. Order details can age out in hours depending on order processing times. Customer data will likely take years to age out. I don't see any data in the table which could be used to effectively partition the data base on 'hotness'. I have generated an archiving system for customers and their data. Determining the customers which could be archived was not trivial.