How to remove data redundancy in product table if multiple suppliers are there for same product

database-designnormalization

I am designing a database where I have different products and suppliers.
For example, I have a product, "mobile phone", say an Apple 5S. This will be sold through my website from different suppliers like Mapple, Mango.

I can not store the data for every single supplier as it will cause data redundancy. I have been considering the following columns for the products table:

  • ProductID
  • SupplierID
  • ProductName
  • Value
  • SKUID
  • VendorName
  • and many more…

enter image description here

We can see the Product name is repeated every time which caused data redundancy.

My main concern is how to avoid data redundancy? How to design the Product table?

Added from comments:

  • There will be a many to many relationship between Products and Suppliers.
  • A Vendor and a Supplier are the same. It is a small organisation like shop owner who are listed to us.
  • There will be number of different shop owners and they all may have the same Product. It will increase everyday. Suppose we have 1000 shop owners (Vendors) with us, and each has 100 Products which are the same, then there will be 1000*100 rows for data which is not required.
  • There is correction in table Vendor and Supplier are the same. About individual or person. They will never be Supplier.

Best Answer

Business rules

Based on (a) some deliberations we had via comments and (b) the content of the question, we have defined the following characteristics of your business environment:

  • A Product is primarily identified by its Id
  • A Product is alternately identified by its Name
  • A Product is alternately identified by its SKU
  • A Product is offered at a Price
  • Supplier and Vendor are different names you use to refer to the same particular entity type
  • An Organization can become a Supplier
  • An Organization is primarily identified by its Id
  • An Organization is alternately identified by its Name
  • A Person cannot become a Supplier
  • A Product can be offered by zero-one-or-many Organizations playing the Role of Suppliers
  • An organization, playing the Role of Supplier, supplies one-to-many Products

So, among other points, there is a many-to-many (M:N) association or relationship involving the entity types called Supplier and Product. This is a very basic —but, at the same time, quite powerful when modeled accurately— construct usually employed to shape some components of a database conceptual model.

IDEF1X model

Consequently, I created an expository IDEF1X1 diagram that consolidates the previously formulated business rules in a single graphical device, which is shown in Figure 1:

Figure 1 - Products and Suppliers Simplified IDEF1X Diagram

As you can see, ProductSupplier is the associative entity that portrays the M:N association that connects the Product and Organization.

Each ProductSupplier is uniquely identified by the combination of values of two distinct properties or attributes, i.e.,

  • SupplierId (a role name I have assigned to Organization.OrganizationId in order to express the meaning that such attribute holds when it migrates2 to the ProductSupplier entity type) and

  • ProductId,

so they are depicted as a composite PRIMARY KEY. These properties are FOREIGN KEY references to Organization and Product, respectively.

With this layout, I have arranged the properties that pertain to a each entity type in their appropriate place:

  • The properties that arise regarding the association between Product and Organization (let us say, the property that denotes the point in time when a given Organization started suppliying a certain Product) are included exclusively in the associative entity type.

  • The properties that concern only to the Organization entity type are placed in the corresponding box, an so are the properties that belong to the Product entity type.

Expository logical SQL-DDL structure

I wrote the DDL statements that follow based on the conceptual definitions discussed above, so that you can get a form a better idea about how to design this part of your database:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on the exact business context characteristics.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions. 

CREATE TABLE Organization (
    OrganizationId  INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    FoundingDate    DATE     NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Organization_PK PRIMARY KEY (OrganizationId),
    CONSTRAINT Organization_AK UNIQUE      (Name) -- ALTERNATE KEY.
);

CREATE TABLE Product (
    ProductId       INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    SKU             CHAR(30) NOT NULL,
    Price           INT      NOT NULL, -- Retains an amount in Cents, but there are other options.
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Product_PK      PRIMARY KEY (ProductId),
    CONSTRAINT Product_AK1     UNIQUE      (Name), -- ALTERNATE KEY.
    CONSTRAINT Product_AK2     UNIQUE      (SKU),  -- ALTERNATE KEY.
    CONSTRAINT PriceIsValid_CK CHECK       (Price > 0)
);

CREATE TABLE ProductSupplier ( -- Represents the conceptual M:N association.
    SupplierId      INT      NOT NULL,
    ProductId       INT      NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT ProductSupplier_PK               PRIMARY KEY (SupplierId, ProductId), -- Composite PRIMARY KEY.
    CONSTRAINT ProductSupplierToOrganization_FK FOREIGN KEY (SupplierId)
        REFERENCES Organization (OrganizationId),
    CONSTRAINT ProductSupplierToProduct_FK      FOREIGN KEY (ProductId)
        REFERENCES Product (ProductId)
);

In this logical structure, as demonstrated:

  • each table denotes an individual entity type;
  • each column stands for an property of the respective entity type;
  • a specific data type is fixed for each column in order to ensure that all the values it contains belong to a particular set (that you will have to adapt to adapt to your organization needs), be it INT, DATETIME, CHAR, etc.; and
  • multiple constraints are set up (declaratively) in order to guarantee that the assertions in form of rows retained in all the tables comply with the business rules determined in the conceptual model.

All these factors help to prevent ambiguities that would have come about if multiple entity types are represented by way of a single table.

Derivable information

Successively, you can derive data from the three different base tables —e.g., by dint of JOINs— in order to take advantage of the logical structure and produce new tables that include information like shown in the pictures you included in the question.

Redundancy

In a database that is designed following the principles of the relational model, duplication of values contained in columns is not only acceptable but expected, while duplicate rows are strictly forbidden.

As for the logical structure expounded above, all the tables are constrained declaratively so as to prevent the INSERTion of duplicate rows, therefore the avoidance of this kind of harmful redundancy is well addressed.

Normalization

Relational normalization —a subject worth mentioning since you added the tag— is a logical-level procedure whose purposes are

  • to decompose via first normal form the table columns whose types accept non-atomic values so that data manipulation and constriction are much easier to cope with by the data sublanguage of use (e.g., SQL), and

  • to get rid of undesirable dependencies among the columns of a table by virtue of the successive normal forms to avoid update anomalies that introduce other kinds of dangerous redundancies.

Of course, one has to take into account the meaning carried by the table(s) and column(s) at issue.

I suggest that you take the time to test the tables I present in the previous logical structure so that you can determine if it meets each normal form.


Endotes

1 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was defined as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.

2 IDEF1X defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.