What database & design would you use for legos

database-designdesign-patterndimensional-modelingnosqlrelations

I'm trying to conceptualize an interesting problem I am designing for. Using lego's (the plastic construction toys) as an analogy seems to work well. If this is a fairly recognizable problem, then I'd appreciate any reference to related information.

Given the scenario details below, what database would you choose, i.e. RDB/SQL or NoSQL or Graph or lucene or ?, and how might you considering modeling the design. Given:

"lego materials" or simply all of the individual type of lego pieces that may exist.
- Each individual piece has metadata / various characteristics such as size, color, bottom/top knobs & gaps, constraints, etc.
"builders catalog" of all of the possible number of building blocks/modules you could create using multiple pieces.
- metadata here might include descriptions, the manufacturers of the module, all of the pieces used to create the module, intended purpose, what knobs/gaps are able to be built on etc.
"final creations"
- essentially all the various lego sets sold as products. The data will include generic terms describing the build, like "castle", "city", "airplane", etc. Then, the instructions which detail every module required to be built to build the final creation.

The solution needs to address scale and the ability to search across all of our objects and their relations to each other. I feel graph dbs may not be able to scale to this intended purpose. Further, the data will grow with every addition to the objects above.

Questions we'd query:

(lego materials) show me all the pieces I have… or… Find a piece structured like this one (using meta data) but also
all the places it has been used to build a module in our builders catalog
(below), as well as final creations… or… find me all the pieces I can place on top of this piece

(builders catalog) Similar composability type questions above as it relates to modules connecting to one another, and final creations. Also… find other modules using the similar lego pieces as this one

(final creations) What other creations are a part of this set (like a Batman series)… or what other creations use similar modules, or similar pieces as this creation.

Best Answer

A standard RDBMS could easily handle this problem. You have a defined schema, a simple one at that, which deals with common industry problems such as Inventory and Production.

Having worked at an engineering and manufacturing company, all of our systems (even third party that we utilized) were on RDBMS, generally Microsoft SQL Server. But any RDBMS would work just fine, such as PostgreSQL, Oracle, or MySQL, to name a few of the mainstream options. So this is a real-world standard use case.

One tip I'll mention is besides our raw Items table (which had a unique row for every part number that we sell or consume at some level of an assembly) we had a table that stored a row for every direct Parent-Child relationship between all of our part numbers. This is immensely useful for querying for any and all possible combinations of things we build at any level of any final assembly item we sell (or sub-assembly / module that goes into that final assembly). You can easily utilize such a table in a recursive self-join (in most RDBMS this is doable via a recursive CTE) to get the entire hierarchy of a parts list for anything then, in your case any Module or FinalCreation. This table had a Quantity column also, to signify how many units of the specific Child part goes into the Parent.

An example of this table would be, if you had a FinalCreation called ABCD, and it was made up of Module B and C and a single raw LEGO brick called D. And let's pretend Module B is made up of raw LEGO brick Z and three D bricks also. Your Parent-Child table, with the columns (ParentItem, ChildItem, Quantity) would have a row for (ABCD, B, 1), (ABCD, C, 1), and (ABCD, D, 1). It would also have a row for (B, Z, 1) and (B, D, 3). With your Parent-Child table you can recursively build the hierarchy of these rows by self-joining on the ParentItem and ChildItem columns to each other. You can even get metrics by grouping and summing on the Quantity column, which in this case would tell you that the FinalCreation ABCD requires a total of 4 raw bricks of item D in inventory.

Finally, I'd just like to point out that despite a RDBMS being a good fit for the problem you're trying to solve, mostly any type of database system will handle scaling (in terms of amount of data vs performance) just the same (just to clarify for future readers). Scaling is a hardware and architecture problem, not a database system problem. Some database systems can support different types of scaling in different ways which may or may not be conducive to one's needs, but it's never a question of which one handles performance better.

Related Solutions

Sql-server – How would you design a table for a booking system

If the concern is manual maintenance of a large number of rows, you could solve this (potentially) by creating tables for each of your dimensions: weeks, time slots, and venues. The complete set of possible slots for booking would be the cross product of these three dimensions. Actual bookings would be another table with foreign keys pointing to all three of these dimension tables.

With this type of design, instead of maintaining 3 x 25 x 26 records you will maintain 3 + 25 + 26 records. Note that if you segregate day of week and time of day into two tables you can reduce the number of records to be maintained even further (3 + 5 + 5 + 26).

The problem with this approach is when (and if) you have an exception. This design assumes that there are no blackouts in your schedule. For example, what if you close a room for a couple of weeks to be renovated? One way to handle this issue is to create booking records that cover the blackouts. If you have enough exceptions, then managing them may be almost as bad as just using the brute force method.

The question I would seriously consider is whether or not generating the initial list of slots available for booking is really that big a deal. You could easily automate the process to generate your big pile of available slots. This is really just a single query.

Database design for tracking referral dollars and referred dollars

It's a Multi-Level Marketing system! Jeff Moden has written a a pair of articles here and here on efficient implementation of Hierarchical Reporting against a SQL Server database. There are a number of ways to store the hierarchical information but the two main ones are Adjacency List (each child has a parent foreign key) and Nested Sets (each parent stores details of its child hierarchy). Adjacency List is more intuitive and faster to update, while Nested Sets provides faster reporting.

Jeff has explained this topic far better than I can, and developed efficient SQL Server algorithms for converting an large Adjacency List tree into a Nested Set representation,

CREATE PROCEDURE dbo.RebuildNestedSets AS
/****************************************************************************
 Purpose:
 Rebuilds a "Hierarchy" table that contains the original Adjacency List,
 the Nested Sets version of the same hierarchy, and several other useful 
 columns of data some of which need not be included in the final table.

 Usage:
 EXEC dbo.RebuildNestedSets

 Progammer's Notes:
 1. As currently written, the code reads from a table called dbo.Employee.
 2. The Employee table must contain well indexed EmployeeID (child) and
    ManagerID (parent) columns.
 3. The Employee table must be a "well formed" Adjacency List. That is, the
    EmployeeID column must be unique and there must be a foreign key on the
    ManagerID column that points to the EmployeeID column. The table must not
    contain any "cycles" (an EmployeeID in its own upline). The Root Node
    must have a NULL for ManagerID.
 4. The final table, named dbo.Hierarchy, will be created in the same 
    database as where this stored procedure is present.  IT DOES DROP THE 
    TABLE CALLED DBO.HIERARCHY SO BE CAREFUL THAT IT DOESN'T DROP A TABLE 
    NEAR AND DEAR TO YOUR HEART.
 5. This code currently has no ROLLBACK capabilities so make sure that you
    have met all of the requirements (and, perhaps, more) cited in #3 above.

 Dependencies:
 1. This stored procedure requires that the following special purpose HTally
    table be present in the same database from which it runs.

--===== Create the HTally table to be used for splitting SortPath
 SELECT TOP 1000 --(4 * 1000 = VARBINARY(4000) in length)
        N = ISNULL(CAST(
                (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)*4+1
            AS INT),0)
   INTO dbo.HTally
   FROM master.sys.all_columns ac1
  CROSS JOIN master.sys.all_columns ac2
;
--===== Add the quintessential PK for performance.
  ALTER TABLE dbo.HTally
    ADD CONSTRAINT PK_HTally 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;

 Revision History:
 Rev 00 - Circa 2009  - Jeff Moden 
        - Initial concept and creation.
 Rev 01 - PASS 2010   - Jeff Moden 
        - Rewritten for presentation at PASS 2010.
 Rev 02 - 06 Oct 2012 - Jeff Moden
        - Code redacted to include a more efficient, higher performmance
          method of splitting the SortPath using a custom HTally Table.
****************************************************************************/
--===========================================================================
--      Presets
--===========================================================================
--===== Suppress the auto-display of rowcounts to prevent from returning
     -- false errors if called from a GUI or other application.


SET NOCOUNT ON;

--===== Start a duration timer
DECLARE @StartTime DATETIME,
        @Duration  CHAR(12);
 SELECT @StartTime = GETDATE();

--===========================================================================
--      1.  Read ALL the nodes in a given level as indicated by the parent/
--          child relationship in the Adjacency List.
--      2.  As we read the nodes in a given level, mark each node with the 
--          current level number.
--      3.  As we read the nodes in a given level, convert the EmployeeID to
--          a Binary(4) and concatenate it with the parents in the previous
--          level's binary string of EmployeeID's.  This will build the 
--          SortPath.
--      4.  Number the rows according to the Sort Path.  This will number the
--          rows in the same order that the push-stack method would number 
--          them.
--===========================================================================
--===== Conditionally drop the final table to make reruns easier in SSMS.
     IF OBJECT_ID('FK_Hierarchy_Hierarchy') IS NOT NULL
        ALTER TABLE dbo.Hierarchy
         DROP CONSTRAINT FK_Hierarchy_Hierarchy;

     IF OBJECT_ID('dbo.Hierarchy','U') IS NOT NULL
         DROP TABLE dbo.Hierarchy;

RAISERROR('Building the initial table and SortPath...',0,1) WITH NOWAIT;
--===== Build the new table on-the-fly including some place holders
   WITH cteBuildPath AS 
( --=== This is the "anchor" part of the recursive CTE.
     -- The only thing it does is load the Root Node.
 SELECT anchor.EmployeeID, 
        anchor.ManagerID, 
        HLevel   = 1,
        SortPath =  CAST(
                        CAST(anchor.EmployeeID AS BINARY(4)) 
                    AS VARBINARY(4000)) --Up to 1000 levels deep.
   FROM dbo.Employee AS anchor
  WHERE ManagerID IS NULL --Only the Root Node has a NULL ManagerID
  UNION ALL 
 --==== This is the "recursive" part of the CTE that adds 1 for each level
     -- and concatenates each level of EmployeeID's to the SortPath column.  
 SELECT recur.EmployeeID, 
        recur.ManagerID, 
        HLevel   =  cte.HLevel + 1,
        SortPath =  CAST( --This does the concatenation to build SortPath
                        cte.SortPath + CAST(Recur.EmployeeID AS BINARY(4))
                    AS VARBINARY(4000))
   FROM dbo.Employee      AS recur WITH (TABLOCK)
  INNER JOIN cteBuildPath AS cte 
          ON cte.EmployeeID = recur.ManagerID
) --=== This final INSERT/SELECT creates the Node # in the same order as a
     -- push-stack would. It also creates the final table with some
     -- "reserved" columns on the fly. We'll leave the SortPath column in
     -- place because we're still going to need it later.
     -- The ISNULLs make NOT NULL columns
 SELECT EmployeeID = ISNULL(sorted.EmployeeID,0),
        sorted.ManagerID,
        HLevel     = ISNULL(sorted.HLevel,0),
        LeftBower  = ISNULL(CAST(0 AS INT),0), --Place holder
        RightBower = ISNULL(CAST(0 AS INT),0), --Place holder
        NodeNumber = ROW_NUMBER() OVER (ORDER BY sorted.SortPath),
        NodeCount  = ISNULL(CAST(0 AS INT),0), --Place holder
        SortPath   = ISNULL(sorted.SortPath,sorted.SortPath)
   INTO dbo.Hierarchy
   FROM cteBuildPath AS sorted
 OPTION (MAXRECURSION 100) --Change this IF necessary
;
RAISERROR('There are %u rows in dbo.Hierarchy',0,1,@@ROWCOUNT) WITH NOWAIT;

--===== Display the cumulative duration
 SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

--===========================================================================
--      Using the information created in the table above, create the
--      NodeCount column and the LeftBower and RightBower columns to create
--      the Nested Sets hierarchical structure.
--===========================================================================
RAISERROR('Building the Nested Sets...',0,1) WITH NOWAIT;

--===== Declare a working variable to hold the result of the calculation
     -- of the LeftBower so that it may be easily used to create the
     -- RightBower in a single scan of the final table.
DECLARE @LeftBower INT
;
--===== Create the Nested Sets from the information available in the table
     -- and in the following CTE. This uses the proprietary form of UPDATE
     -- available in SQL Serrver for extra performance.
   WITH cteCountDownlines AS
( --=== Count each occurance of EmployeeID in the sort path
 SELECT EmployeeID = CAST(SUBSTRING(h.SortPath,t.N,4) AS INT), 
        NodeCount  = COUNT(*) --Includes current node
   FROM dbo.Hierarchy h, 
        dbo.HTally t
  WHERE t.N BETWEEN 1 AND DATALENGTH(SortPath)
  GROUP BY SUBSTRING(h.SortPath,t.N,4)
) --=== Update the NodeCount and calculate both Bowers
 UPDATE h
    SET @LeftBower   = LeftBower = 2 * NodeNumber - HLevel,
        h.NodeCount  = downline.NodeCount,
        h.RightBower = (downline.NodeCount - 1) * 2 + @LeftBower + 1
   FROM dbo.Hierarchy h
   JOIN cteCountDownlines downline
     ON h.EmployeeID = downline.EmployeeID
;
RAISERROR('%u rows have been updated to Nested Sets',0,1,@@ROWCOUNT)
WITH NOWAIT;

RAISERROR('If the rowcounts don''t match, there may be orphans.'
,0,1,@@ROWCOUNT)WITH NOWAIT;

--===== Display the cumulative duration
 SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

--===========================================================================
--      Prepare the table for high performance reads by adding indexes.
--===========================================================================
RAISERROR('Building the indexes...',0,1) WITH NOWAIT;

--===== Direct support for the Nested Sets
  ALTER TABLE dbo.Hierarchy 
    ADD CONSTRAINT PK_Hierarchy
        PRIMARY KEY CLUSTERED (LeftBower, RightBower) WITH FILLFACTOR = 100
;
 CREATE UNIQUE INDEX AK_Hierarchy 
     ON dbo.Hierarchy (EmployeeID) WITH FILLFACTOR = 100
;
  ALTER TABLE dbo.Hierarchy
    ADD CONSTRAINT FK_Hierarchy_Hierarchy FOREIGN KEY
        (ManagerID) REFERENCES dbo.Hierarchy (EmployeeID) 
     ON UPDATE NO ACTION 
     ON DELETE NO ACTION
;
--===== Display the cumulative duration
 SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

--===========================================================================
--      Exit
--===========================================================================
RAISERROR('===============================================',0,1) WITH NOWAIT;
RAISERROR('RUN COMPLETE',0,1) WITH NOWAIT;
RAISERROR('===============================================',0,1) WITH NOWAIT;

and then to report hierarchical subtotals (as you require here) from the Nested Set representation:

--===== Start a "Timer" to see how long this all takes.
DECLARE @StartTime DATETIME;
 SELECT @StartTime = GETDATE();

--===========================================================================
--      1.  Read ALL the nodes in a given level as indicated by the parent/
--          child relationship in the Adjacency List.
--      2.  As we read the nodes in a given level, mark each node with the 
--          current level number.
--      3.  As we read the nodes in a given level, convert the EmployeeID to
--          a Binary(4) and concatenate it with the parents in the previous
--          level’s binary string of EmployeeID’s.  This will build the 
--          SortPath.
--===========================================================================
--===== Conditionally drop the work table to make reruns easier in SSMS.
     IF OBJECT_ID('dbo.Hierarchy','U') IS NOT NULL
         DROP TABLE dbo.Hierarchy;

--===== Build the new table on-the-fly including some place holders
   WITH cteBuildPath AS 
( --=== This is the "anchor" part of the recursive CTE.
     -- The only thing it does is load the Root Node.
 SELECT anchor.EmployeeID, 
        anchor.ManagerID, 
        HLevel   = 1,
        SortPath =  CAST(
                        CAST(anchor.EmployeeID AS BINARY(4)) 
                    AS VARBINARY(4000)) --Up to 1000 levels deep.
   FROM dbo.Employee AS anchor
  WHERE ManagerID IS NULL --Only the Root Node has a NULL ManagerID
  UNION ALL 
 --==== This is the "recursive" part of the CTE that adds 1 for each level
     -- and concatenates each level of EmployeeID's to the SortPath column.  
 SELECT recur.EmployeeID, 
        recur.ManagerID, 
        HLevel   =  cte.HLevel + 1,
        SortPath =  CAST( --This does the concatenation to build SortPath
                        cte.SortPath + CAST(Recur.EmployeeID AS BINARY(4))
                    AS VARBINARY(4000))
   FROM dbo.Employee      AS recur WITH (TABLOCK)
  INNER JOIN cteBuildPath AS cte 
          ON cte.EmployeeID = recur.ManagerID
) --=== This final INSERT/SELECT creates an iterim working table to hold the
     -- original Adjacency List, the hierarchal level of each node, and the
     -- SortPath which is the binary representation of each node's upline.
     -- The ISNULLs make NOT NULL columns
 SELECT EmployeeID = ISNULL(sorted.EmployeeID,0),
        sorted.ManagerID,
        Sales      = ISNULL(CAST(0 AS BIGINT),0), --Place Holder
        HLevel     = ISNULL(sorted.HLevel,0),
        SortPath   = ISNULL(sorted.SortPath,sorted.SortPath)
   INTO dbo.Hierarchy
   FROM cteBuildPath AS sorted
 OPTION (MAXRECURSION 100) --Change this IF necessary
;
--===== You'll be tempted to add the following index because it seems so
     -- logical a thing to do for performance, but DON'T do it! It will
     -- actually slow the rest of the code down by a factor of 2!!!!
 --ALTER TABLE dbo.Hierarchy
 --  ADD CONSTRAINT PK_Hierarchy PRIMARY KEY CLUSTERED (EmployeeID)
--;
--===== Populate the Hierarchy table with current Sales data.
 UPDATE h 
    SET h.Sales = s.Sales
   FROM dbo.Hierarchy h
  INNER JOIN dbo.CurrentMonthlySales s
     ON h.EmployeeID = s.EmployeeID
;
--===== Conditionally drop the final table to make reruns easier in SSMS.
     IF OBJECT_ID('dbo.PreAggregatedHierarchy,'U') IS NOT NULL
        DROP TABLE dbo.PreAggregatedHierarchy
;
--===== Now, build "Everything" into the PreAggregatedHierarchy table.
WITH
cteSplit AS
(--==== Splits the path into elements (including Sales and HLevel) 
     -- so that we can aggregate them by EmployeeID and HLevel.
     -- Can't aggregate here without including the SortPath so we don't.
 SELECT EmployeeID = CAST(SUBSTRING(h.SortPath,t.N,4) AS INT),
        h.HLevel, h.Sales
   FROM dbo.HTally         AS t
  CROSS JOIN dbo.Hierarchy AS h
  WHERE t.N BETWEEN 1 AND DATALENGTH(SortPath)
),
cteAggregate AS
(--==== Creates the aggregates and introduces the "Relative Level" column.
     -- NodeCount = Count of nodes in downline for each EmployeeID by Level
     -- Sales = Total Sales in downline for each EmployeeID by Level.
 SELECT EmployeeID,
        HLevel,
        RLevel    = ROW_NUMBER() OVER (PARTITION BY EmployeeID 
                                           ORDER BY EmployeeID, HLevel),
        NodeCount = COUNT(*),
        Sales     = SUM(CAST(Sales AS MONEY))
   FROM cteSplit
  GROUP BY EmployeeID, HLevel
)
--===== Adds a "Rollup" to create all the subtotals that we need.
     -- We couldn't do this in the previous step because we didn't know what
     -- the "Relative Level" was for each row, yet.
     -- The HAVING eliminates unnecessary subtotals that are created.
 SELECT EmployeeID = ISNULL(a.EmployeeID,0), --Convert NULL total lines to 0
        HLevel     = MIN(a.HLevel), --Just so we don't have to GROUP BY
        RLevel     = ISNULL(CAST(a.RLevel AS TINYINT),0),
        NodeCount  = SUM(a.NodeCount), --Just so we don't have to GROUP BY
        Sales      = SUM(a.Sales) --Just so we don't have to GROUP BY
   INTO dbo.PreAggregatedHierarchy
   FROM cteAggregate a
  GROUP BY EmployeeID, RLevel WITH ROLLUP
 HAVING EmployeeID > 0 --Eliminates the NULL total lines for cleaner output
;
--===== Add the Clustered Index as a Primary Key
  ALTER TABLE dbo.PreAggregatedHierarchy
    ADD CONSTRAINT PK_PreAggregatedHierarchy 
        PRIMARY KEY CLUSTERED (EmployeeID, RLevel) WITH FILLFACTOR = 100
;
--===== Display how long it all took
  PRINT 'Duration: ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' (hh:mi:ss:mmm)';

Best Answer

Related Solutions

Sql-server – How would you design a table for a booking system

Database design for tracking referral dollars and referred dollars

Related Question