How to Model a Relational Database for a Sales Business Domain

database-design

I'm trying to come up with a solution that essentially boils down to visualising sales data, across many depts, in many stores, in many areas. The most significant challenge is a way to conceptualise/store this data in a meaningful way.

Store breakdown:

  • 8 stores per area
  • 3 areas per region
  • 3 regions nationally

Department breakdown:

  • 50 micro departments per store (think “chocolate”, “confectionary”, “headphones”) that belong to:
  • 8 major departments (e.g. “chocolate” and “confectionary” belong to “grocery”, “headphones” and “CDs” belong to “entertainment”) that, in turn, belong to:
  • 2 department divisions (each division would hold 4 of the major depts)

Sales data is generated on a weekly basis and then we aggregate it how we see fit. Each week a new spreadsheet is created and each store has three columns devoted to their store, which breaks down each micro department, the sales this year, the sales last year (and then we look at % gains/losses).

So what I'm trying to do is conceptualise a way to store the data meaningfully (all weeks, all departments, all stores) and then we'll perform some analysis on it later to extract what we need.

My current thinking is:

  • each store instance would have a store ID, name, area & region.
  • each micro department instance would have an ID, name and the major dept it belongs to
  • each week would have a week ID (we use financial weeks) and alternatively a date
  • we then create sales instances with a store ID, department ID, sales TY, $ sales LY and a week #

Is there a better way of representing the data or something that I'm missing?

Best Answer

I would establish 2 hierarchies:

region:
    region_id (PK)
    region_name
    ...

area:
    area_id (PK)
    region_id (FK to region)
    area_name
    ...

store:
    store_id (PK)
    area_id (FK to area)
    store_name
    ...

and

division:
    division_id (PK)
    division_name
    ...

department:
    department_id (PK)
    division_id (FK to division)
    department_name
    ...

subdepartment:
    subdepartment_id (PK)
    department_id (FK to department)
    subdepartment_name
    ...

Since store is tied to area and area to region, you don't need to directly tie store to region. Using ID values instead of embedding the area or region names directly simply allows those things to be renamed (and they will be). Same logical holds true for subdepartment.

I would build the sales_history table with the following columns:

  • store_id
  • subdepartment_id
  • sales_year - You may run your sales year from January through December, but this allows other options, without having to calculate the sales year off the date.
  • week_no - Again, this is something that could probably be calculated off the date you record, but having it explicitly simplifies searching, comparisons, and allows it to remain correct for past years, even if the calculation for a future year is different.
  • sales_date - Probably the first date in the week.
  • sales_amount
  • and, any other fields you need.

This will allow you to compare 2016 weeks 1-13 with 2017 weeks 1-13 at any level (store and department; region and subdepartment; whatever). You can also easily compare 2015, 2016, and 2017 (not as easy if storing this year's and last year's values in the same row).