Best Practices to Normalize Unoptimized SKU Table with 27 Columns

MySQLnormalizationschema

What a learning experience this has been.

Started out not knowing a damn about SQL normalization or database techniques in general.
As time has passed I'm beginning to see the pitfalls of trying to maintain a larger and larger database.

Here's how it began:
New job, an eCommerce business who does most of its sales through various channels (amazon, ebay, etc), is struggling with maintain an Inventory System.

None of the options on the market worked for them because of the SKU structure in place on the listings, and the fact that some listings would come bundled with up to 20 SKUs in one listing!

Not really knowing what to do, and just trying to devise a workable system, I created a MySQL database with a table like this called MasterSKU

CREATE TABLE `MasterSKU` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `SKU` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `AltSKU` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `QtySKU` int(11) NOT NULL DEFAULT '1',
 `MultSKU` tinyint(1) NOT NULL DEFAULT '0',
 `SKU_1` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_2` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_3` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_4` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_5` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_6` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_7` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_8` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_9` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_10` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_11` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_12` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_13` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_14` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_15` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_16` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_17` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_18` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_19` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `SKU_20` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `processed` tinyint(1) NOT NULL DEFAULT '0',
 `comments` mediumtext COLLATE utf8_unicode_ci,
 PRIMARY KEY (`id`),
 UNIQUE KEY `SKU` (`SKU`)
) ENGINE=InnoDB AUTO_INCREMENT=8901 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Looks perfectly optimized, right?

My logic when creating was like this.

  • SKU represents the SKU in the listing (not the inventory SKU itself).
  • AltSKU is for (normal) listings that only have one Inventory SKU attached to the listing. Here I input the actual Inventory SKU.
    QtySKU is for listings that have bulk (or more than one) Quantities in the Listing (such as 10 T-Shirts for X SKU, the database needs to know this listing should deduct 10).
    MultSKU is default 0, but if a listing has more than one Inventory SKU attached to it, like a majority of the bundled listings are, I set to 1 and therefore my script knows to look into SKU_1 —- > SKU_20 to find what SKU's are attached to the listing.
  • processed is another default 0, I have a script that runs and inserts and searches for new listings to put in the MasterSKU table. If it's set to 0, I know I haven't attached any SKU's to it and need to ASAP so the Inventory System can start deducting.
  • comments was just my personal comments when doing it, I can delete this no issue and probably should

Before I make this major change, I'd like some optimization tips/strategies to get things right. My main issue at the moment is that since this Table is growing, my Queries are getting slower and slower as it has to search through the 20 SKU_x – to SKU_xx columns and then deduct based off those, making it a pain in the ass and slowing down the server considerably.

Certain for one I can combine SKU_1, SKU_2, SKU_3… all 20 columns up until SKU_20.. let's call it mSKUs.

Ideas that popped in my head… converting the "MultSKU"s – ie, SKU_1 through SKU_20 in json format in a column mSKUs. Or just add a delimiter inbetween SKUs, so this column would contain like Inventory SKU #1|Inventory SKU #2|Inventory SKU #3 and so forth.

I'm leaning towards the second one, but perhaps I should read a book first haha, as my ideas work but in the long-term are not really sustainable. The json alternative seems alright, but then I think about all my PHP scripts that are connecting these databases, and having to decode and re-encode might be more of a headache and I'd assume a slower performance as they are encoded.

That's it. That's my messy messy table.

Any ideas/first thoughts/inclinations on how this table could or should be normalized for performance?

Many thanks!

Best Answer

Create a second table to store the SKUs related to a single MasterSKU.

Something like:

CREATE TABLE `MasterSKU` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `SKU` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `processed` tinyint(1) NOT NULL DEFAULT '0',
 `comments` mediumtext COLLATE utf8_unicode_ci,
 PRIMARY KEY (`id`),
 UNIQUE KEY `SKU` (`SKU`)
) ENGINE=InnoDB AUTO_INCREMENT=8901 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `ChildSKU` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `MasterSKUid` int(11) NOT NULL,
 INDEX MasterSKU_ind (MasterSKUid),
    FOREIGN KEY (`MasterSKUid`)
        REFERENCES MasterSKU(id),
 `SKU` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=8901 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Inserting data looks like:

INSERT INTO `MasterSKU` (`SKU`, `processed`, `comments`)
VALUES ("1", 0, "test master sku");

INSERT INTO `ChildSKU` (`MasterSKUid`, `SKU`)
VALUES (8901, "2");
INSERT INTO `ChildSKU` (`MasterSKUid`, `SKU`)
VALUES (8901, "3");
INSERT INTO `ChildSKU` (`MasterSKUid`, `SKU`)
VALUES (8901, "4");

Querying the tables looks like:

SELECT mk.SKU
  , mk.processed
  , mk.comments
  , ck.SKU AS "Child SKU"
FROM `MasterSKU` mk
  INNER JOIN `ChildSKU` ck ON mk.id = ck.MasterSKUid
ORDER BY mk.id, ck.id;
| SKU | processed |        comments | Child SKU |
|-----|-----------|-----------------|-----------|
|   1 |     false | test master sku |         2 |
|   1 |     false | test master sku |         3 |
|   1 |     false | test master sku |         4 |

As you can see from the above, you can have any number of child SKUs per master SKU. This is one of the founding principals of relational database management systems in that it only allows rows to be inserted into ChildSKU with a valid referenced-row in MasterSKU; referred to as referential integrity.

Additional columns can be added to the ChildSKU table denoting the properties of each child SKU, such as whether it is the AltSKU, or QtySKU, etc, via a SKUtype column. I'll leave that as an exercise for the reader.