How to Model a Relational Database for a Sales Business Domain

database-design

I'm trying to come up with a solution that essentially boils down to visualising sales data, across many depts, in many stores, in many areas. The most significant challenge is a way to conceptualise/store this data in a meaningful way.

Store breakdown:

8 stores per area
3 areas per region
3 regions nationally

Department breakdown:

50 micro departments per store (think “chocolate”, “confectionary”, “headphones”) that belong to:
8 major departments (e.g. “chocolate” and “confectionary” belong to “grocery”, “headphones” and “CDs” belong to “entertainment”) that, in turn, belong to:
2 department divisions (each division would hold 4 of the major depts)

Sales data is generated on a weekly basis and then we aggregate it how we see fit. Each week a new spreadsheet is created and each store has three columns devoted to their store, which breaks down each micro department, the sales this year, the sales last year (and then we look at % gains/losses).

So what I'm trying to do is conceptualise a way to store the data meaningfully (all weeks, all departments, all stores) and then we'll perform some analysis on it later to extract what we need.

My current thinking is:

each store instance would have a store ID, name, area & region.
each micro department instance would have an ID, name and the major dept it belongs to
each week would have a week ID (we use financial weeks) and alternatively a date
we then create sales instances with a store ID, department ID, sales TY, $ sales LY and a week #

Is there a better way of representing the data or something that I'm missing?

Best Answer

I would establish 2 hierarchies:

region:
    region_id (PK)
    region_name
    ...

area:
    area_id (PK)
    region_id (FK to region)
    area_name
    ...

store:
    store_id (PK)
    area_id (FK to area)
    store_name
    ...

and

division:
    division_id (PK)
    division_name
    ...

department:
    department_id (PK)
    division_id (FK to division)
    department_name
    ...

subdepartment:
    subdepartment_id (PK)
    department_id (FK to department)
    subdepartment_name
    ...

Since store is tied to area and area to region, you don't need to directly tie store to region. Using ID values instead of embedding the area or region names directly simply allows those things to be renamed (and they will be). Same logical holds true for subdepartment.

I would build the sales_history table with the following columns:

store_id
subdepartment_id
sales_year - You may run your sales year from January through December, but this allows other options, without having to calculate the sales year off the date.
week_no - Again, this is something that could probably be calculated off the date you record, but having it explicitly simplifies searching, comparisons, and allows it to remain correct for past years, even if the calculation for a future year is different.
sales_date - Probably the first date in the week.
sales_amount
and, any other fields you need.

This will allow you to compare 2016 weeks 1-13 with 2017 weeks 1-13 at any level (store and department; region and subdepartment; whatever). You can also easily compare 2015, 2016, and 2017 (not as easy if storing this year's and last year's values in the same row).

Related Solutions

Data Model for Summarizing Student Info in SQL Server

Calculate those counts on the fly.

For most data sets of this nature, I would imagine calculating the count of tardiness events would be relatively cheap. How many rows do you expect there to be in the long run in your students or attendance tables? Probably on the order of tens of thousands at most. Contrast that to calculating an account balance from its transaction history in database for a large bank with billions of transactions. In this case I would consider persisting a summary of these aggregations somewhere.

If you wanted to pre-calculate this data anyway, I would summarize the count of tardiness events per student per year in an indexed view as suggested in the top answer here.

For example:

CREATE VIEW dbo.tardiness_summary
WITH SCHEMABINDING
AS
SELECT
     student_id
   , year
   , COUNT_BIG(*) AS tardy_count
FROM dbo.attendance
GROUP BY 
     student_id
   , year
;

CREATE UNIQUE CLUSTERED INDEX IX_tardiness_summary
ON dbo.tardiness_summary (
     student_id
   , year
;

This is not a flexible approach, however, as this view is now schema-bound to the base tables and thus any modifications to either the view or the table will require rebuilding the view. Indexed views also have many restrictions on how they may be created or queried. Their advantage is that they guarantee the summary table will stay in sync with its sources because the database engine is now doing this work for you.

Sql-server – How would you design a table for a booking system

If the concern is manual maintenance of a large number of rows, you could solve this (potentially) by creating tables for each of your dimensions: weeks, time slots, and venues. The complete set of possible slots for booking would be the cross product of these three dimensions. Actual bookings would be another table with foreign keys pointing to all three of these dimension tables.

With this type of design, instead of maintaining 3 x 25 x 26 records you will maintain 3 + 25 + 26 records. Note that if you segregate day of week and time of day into two tables you can reduce the number of records to be maintained even further (3 + 5 + 5 + 26).

The problem with this approach is when (and if) you have an exception. This design assumes that there are no blackouts in your schedule. For example, what if you close a room for a couple of weeks to be renovated? One way to handle this issue is to create booking records that cover the blackouts. If you have enough exceptions, then managing them may be almost as bad as just using the brute force method.

The question I would seriously consider is whether or not generating the initial list of slots available for booking is really that big a deal. You could easily automate the process to generate your big pile of available slots. This is really just a single query.

Best Answer

Related Solutions

Data Model for Summarizing Student Info in SQL Server

Sql-server – How would you design a table for a booking system

Related Question