Designing a Products database: One Products table or many separated by vendor

database-design

In order to prevent the unwieldiness of managing product data spread across a number of excel spreadsheets, I have set up a Products database.

My data is in spreadsheets because it was (somewhat) easy to export a tab-delimited text upload file for both the Amazon and Google e-commerce platforms.

But I've realized that I can achieve the same simple exporting goal through a relational database, minus the inconsistency headaches and plus the added benefit of an expressive query language.

Now my main question is how I should design my database; and, namely, how many tables my Products database should contain. I see three options here:

  1. 1 table for all products. This seems to me like it would avoid redundancy, and thus feels like the best choice. Take an example: if I decide to add an attribute to the database at a later date, I would not have to do so for each vendor/category (see option 2 and 3). I am inclined to say that there is a downside because products of one category will have many NULLs in attributes meant for other categories, but I'm not sure if this is actually a negative.
  2. 1 table for each vendor and its products. This was my instinctive first choice, but I don't find it to be the most logical division: what's critical in determining the uniqueness of a product's attributes is its category, not which vendor it was purchased from. Moreover, even though one vendor will often specialize in a certain kind of product, there will be category overlap if a vendor sells more than one kind of product.
  3. 1 table for each kind of product (e.g. padlocks, chain, safety equipment). I think this is the most reasonable choice other than number 1, because it is almost a guarantee that one kind of product will require different attributes than another. The obvious con I see to this strategy is the difficulty of making the divisions. I'll take hard hats as an example. Should hard hats have their own table? Certainly not. (Unless of course your business is in hard hats.) So, moving up the category chain, should construction accessories have its own table? Maybe closer. How about safety apparel? That makes sense to me, but then there will be the same problem of the many NULLS as option 1 because hard hats will be lumped in with safety vests and goggles etc. etc.

Hopefully I'm not way off the mark with my suggestions. I didn't want to ask before giving my questions some thought, but I'm new to databases, so I'm sure I didn't make the most informed reasoning. I'm heavily leaning towards option number 1, but I'd love to hear advice on any, or if I missed an obvious strategy entirely.

Best Answer

I would go with number 1. Having a separate table for each vendor or each product type will be a nightmare when trying to run reports that need data from multiple product tables. You don't want to have to create a new table each time you add a vendor or product type. I understand your inclination coming from a spreadsheet, but with the ability to query the records easily, separating the products into multiple tables is not the way to go.

I am inclined to say that there is a downside because products of one category will have many NULLs in attributes meant for other categories, but I'm not sure if this is actually a negative.

To prevent this, attributes should be in a separate table.

For example, a partial table definition may be as follows:

products
-- id
-- name

product_attributes 
--product_id
--name
--value

attributes.product_id is a foreign key of products.id.

To further normalize it you could define the attributes in a separate table and have an attribute_id field instead of attribute_name:

    attributes
    --id
    --name

    product_attributes 
    --attribute_id (foreign key of attributes.id)
    --value

(edited for spelling)