Database Design – Databases for Representing Ordered Sets

database-designdatabase-recommendation

In a SQL database, the only way to represent an arbitrarily ordered set is to give every record an "Order" and every time you update or move an item around in this ordering you have to update or somehow maintain the entire list of ranks with a nightly job or something like that.

For example, I can represent the ordered set [C, B, D, A] in this way in a SQL database:

ID  Name   Order
1   A      4
2   B      2
3   C      1
4   D      3

If I want to move an item to a different position in the set, or prepend a new item, I may have to update a lot of items. In general there is a lot of maintenance overhead with this approach.

Querying the data once it is in the database is not an issue for SQL, the issue is the significant maintenance overhead of reordering the set. There is no simple operation in SQL to move an item to a new position in the set. The ordering is arbitrary and user-defined.

I realize that this can be accomplished using SQL, it's just very clunky to perform certain operations like prepending items, or moving an item to a new position. Even this example operation of reversing the order of the set requires a pretty lengthy, complex, query. The type of database I'm looking for might support such an operation natively, or at least more elegantly.

So, if I am designing an application (like Trello, for example) that very heavily involves ordered sets, it seems SQL is not the ideal database technology for me. Are there any databases whose syntax support ordered sets in a more natural way?

These are some CQL3 queries from the Cassandra documentation that seem close to what I'm looking for. This prepends an item to an ordered set.

UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';

This one will set the value of the item at position 2 in the set. I suspect I could use this to easily perform arbitrary swaps/reorders.

UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';

Unfortunately the documentation also states

And while we may (or may not) relax that rule a bit in the future,
this still means that collections are not meant to be excessively
large. They are not a replacement for a proper modelisation into
tables.

which seems to suggest that ordered sets are not (yet) first class citizens in CQL3.

Best Answer

Most of the major relational DBMSs support structured types in the form of XML or JSON. These are order-preserving. Typically the corresponding programming language (T-SQL, PL/SQL) will have built-in features to manipulate these types much as SQL manipulates columns and rows.

Some relational stores also support ARRAY data types (one example). An item will retain its before and after relationship to those items adjacent to it irrespective of what happens in other parts of the array. Unlike, say, JSON the array itself cannot contain complex types so holding an array of surrogate IDs and pulling the remaining data on demand may be necessary.

If you choose to adopt structured types why not go the whole hog and use a DBMS designed around them. This is the realm of NoSQL. There are any number of products, each with advantages and drawbacks.

Finally I'd mention graph stores. Their schtick is to focus on the connectedness of items. This is appropriate for ordered lists as the defining characteristic is how one item follows another. So the items could be modelled as graph nodes with edges, specifically "follows" edges, linking nodes in the desired sequence.

Having worked with each of these to a greater or lesser extent it is my opinion that none of them is significantly less work in application programming terms, including the straight-up SQL approach which you dislike.

Business rules

In order to start delineating the relevant conceptual schema, some of the most important business rules determined so far (restricting the analysis to the three disclosed Categories only, to keep things as brief as possible) can be formulated as follows:

A User owns zero-one-or-many Items.
An Item is owned by exactly-one User at a specific instant.
An Item may be owned by one-to-many Users at distinct points in time.
An Item is classified by exactly-one Category.
An Item is, at all times,
- either a Car
- or a Boat
- or a Plane.

Illustrative IDEF1X diagram

Figure 1 displays an IDEF1X¹ diagram that I created to group the previous formulations along with other business rules that appear pertinent:

Figure 1 - Item and Categories Supertype-Subtype Structure

Supertype

On the one hand, Item, the supertype, presents the properties^† or attributes that are common to all the Categories, i.e.,

CategoryCode —specified as a FOREIGN KEY (FK) that references Category.CategoryCode and functions as a subtype discriminator, i.e., it indicates the exact Category of subtype with which a given Item must be connected—,
OwnerId —distinguished as a FK that points to User.UserId, but I assigned it a role name² in order to reflect its special implications more accurately—,
Foo,
Bar,
Baz and
CreatedDateTime.

Subtypes

On the other hand, the properties^‡ that pertain to every particular Category, i.e.,

Qux and Corge;
Grault, Garply and Plugh;
Xyzzy, Thud, Wibble and Flob;

are shown in the corresponding subtype box.

Identifiers

Then, the Item.ItemId PRIMARY KEY (PK) has migrated³ to the subtypes with different role names, i.e.,

CarId,
BoatId and
PlaneId.

Mutually exclusive associations

As depicted, there is an association or relationship with a one-to-one (1:1) cardinality ratio between (a) each supertype occurrence and (b) its complementary subtype instance.

The exclusive subtype symbol portrays the fact that the subtypes are mutually exclusive, i.e., a concrete Item occurrence can be supplemented by a single subtype instance only: either one Car, or one Plane, or one Boat (never by zero or less, nor by two or more).

_{^{†, ‡} I employed classic placeholder names to entitle some of the entity type properties, as their actual denominations were not supplied in the question.}

Expository logical-level layout

Consequently, in order to discuss an expository logical design, I derived the following SQL-DDL statements based on the IDEF1X diagram displayed and described above:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define the 
-- most convenient INDEX strategies based on the exact 
-- data manipulation tendencies of your business context.

-- As one would expect, you are free to utilize 
-- your preferred (or required) naming conventions. 

CREATE TABLE UserProfile (
    UserId          INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    BirthDate       DATE     NOT NULL,
    GenderCode      CHAR(3)  NOT NULL,
    Username        CHAR(20) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT UserProfile_PK  PRIMARY KEY (UserId),
    CONSTRAINT UserProfile_AK1 UNIQUE ( -- Composite ALTERNATE KEY.
        FirstName,
        LastName,
        GenderCode,
        BirthDate
    ),
    CONSTRAINT UserProfile_AK2 UNIQUE (Username) -- ALTERNATE KEY.
);

CREATE TABLE Category (
    CategoryCode     CHAR(1)  NOT NULL, -- Meant to contain meaningful, short and stable values, e.g.; 'C' for 'Car'; 'B' for 'Boat'; 'P' for 'Plane'.
    Name             CHAR(30) NOT NULL,
    --
    CONSTRAINT Category_PK PRIMARY KEY (CategoryCode),
    CONSTRAINT Category_AK UNIQUE      (Name) -- ALTERNATE KEY.
);

CREATE TABLE Item ( -- Stands for the supertype.
    ItemId           INT      NOT NULL,
    OwnerId          INT      NOT NULL,
    CategoryCode     CHAR(1)  NOT NULL, -- Denotes the subtype discriminator.
    Foo              CHAR(30) NOT NULL,
    Bar              CHAR(40) NOT NULL,
    Baz              CHAR(55) NOT NULL,  
    CreatedDateTime  DATETIME NOT NULL,
    --
    CONSTRAINT Item_PK             PRIMARY KEY (ItemId),
    CONSTRAINT Item_to_Category_FK FOREIGN KEY (CategoryCode)
        REFERENCES Category    (CategoryCode),
    CONSTRAINT Item_to_User_FK     FOREIGN KEY (OwnerId)
        REFERENCES UserProfile (UserId)  
);

CREATE TABLE Car ( -- Represents one of the subtypes.
    CarId INT          NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Qux   DATE         NOT NULL,
    Corge DECIMAL(5,2) NOT NULL,   
    --
    CONSTRAINT Car_PK         PRIMARY KEY (CarId),
    CONSTRAINT Car_to_Item_FK FOREIGN KEY (CarId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidQux_CK    CHECK       (Qux >= '1990-01-01')   
);

CREATE TABLE Boat ( -- Stands for one of the subtypes.
    BoatId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Grault SMALLINT NOT NULL,
    Garply DATETIME NOT NULL,   
    Plugh  CHAR(63) NOT NULL, 
    --
    CONSTRAINT Boat_PK         PRIMARY KEY (BoatId),
    CONSTRAINT Boat_to_Item_FK FOREIGN KEY (BoatId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidGrault_CK  CHECK       (Grault <= 10000)  
);

CREATE TABLE Plane ( -- Denotes one of the subtypes.
    PlaneId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
    Xyzzy   BIGINT   NOT NULL,
    Thud    TEXT     NOT NULL,  
    Wibble  CHAR(20) NOT NULL, 
    Flob    BIT(1)   NOT NULL,   
    --
    CONSTRAINT Plane_PK         PRIMARY KEY (PlaneId),
    CONSTRAINT Plane_to_Item_PK FOREIGN KEY (PlaneId)
        REFERENCES Item (ItemId),
    CONSTRAINT ValidXyzzy_CK    CHECK       (Xyzzy <= 3258594758)
);

^{This has been tested in this db<>fiddle running on MySQL 8.0.}

As demonstrated, the superentity type and each of the subentity types are represented by the corresponding base table.

The columns CarId, BoatId and PlaneId, constrained as the PKs of the appropriate tables, help in representing the conceptual-level one-to-one association by way of FK constraints^§ that point to the ItemId column, which is constrained as the PK of the Item table. This signifies that, in an actual “pair”, both the supertype and the subtype rows are identified by the same PK value; thus, it is more than opportune to mention that

(a) attaching an extra column to hold system-controlled surrogate values^‖ to (b) the tables standing for the subtypes is (c) entirely superfluous.

_{^§ In order to prevent problems and errors concerning (particularly FOREIGN) KEY constraint definitions —situation you referred to in comments—, it is very important to take into account the existence-dependency that takes place among the different tables at hand, as exemplified in the declaration order of the tables in the expository DDL structure, which I supplied in this db<>fiddle too.}

_{^‖ E.g., appending an additional column with the AUTO_INCREMENT property to a table of a database built on MySQL.}

Integrity and consistency considerations

It is critical to point out that, in your business environment, you have to (1) ensure that each “supertype” row is at all times complemented by its corresponding “subtype” counterpart, and, in turn, (2) guarantee that said “subtype” row is compatible with the value contained in the “discriminator” column of the “supertype” row.

It would be very elegant to enforce such circumstances in a declarative manner but, unfortunately, none of the major SQL platforms has provided the proper mechanisms to do so, as far as I know. Therefore, resorting to procedural code within ACID TRANSACTIONS it is quite convenient so that these conditions are always met in your database. Other option would be employing TRIGGERS, but they tend to make things untidy, so to speak.

Declaring useful views

Having a logical design like the one explained above, it would be very practical to create one or more views, i.e., derived tables that comprise columns that belong to two or more of the relevant base tables. In this way, you can, e.g., SELECT directly FROM those views without having to write all the JOINs every time you have to retrieve “combined” information.

Sample data

In this respect, let us say that the base tables are “populated” with the sample data shown below:

--
INSERT INTO UserProfile 
    (UserId, FirstName, LastName, BirthDate, GenderCode, Username, CreatedDateTime)
VALUES
    (1, 'Edgar', 'Codd', '1923-08-19', 'M', 'ted.codd', CURDATE()),
    (2, 'Michelangelo', 'Buonarroti', '1475-03-06', 'M', 'michelangelo', CURDATE()),
    (3, 'Diego', 'Velázquez', '1599-06-06', 'M', 'd.velazquez', CURDATE());

INSERT INTO Category 
    (CategoryCode, Name)
VALUES
    ('C', 'Car'), ('B', 'Boat'), ('P', 'Plane');

-- 1. ‘Full’ Car INSERTion

-- 1.1 
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (1, 1, 'C', 'Motorway', 'Tire', 'Chauffeur', CURDATE());
 
 -- 1.2
INSERT INTO Car
    (CarId, Qux, Corge)
VALUES
    (1, '1999-06-11',  999.99);

-- 2. ‘Full’ Boat INSERTion

-- 2.1
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (2, 2, 'B', 'Ocean', 'Anchor', 'Sailor', CURDATE());

-- 2.2
INSERT INTO Boat
    (BoatId, Grault, Garply, Plugh)
VALUES
    (2, 10000, '2016-03-09 07:32:04.000', 'So far so good.');

-- 3 ‘Full’ Plane INSERTion

-- 3.1   
INSERT INTO Item
    (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
VALUES
    (3, 3, 'P', 'Sky', 'Wing', 'Aviator', CURDATE());

-- 3.2

INSERT INTO Plane
    (PlaneId, Xyzzy, Thud, Wibble, Flob)
VALUES
    (3, 3258594758, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sollicitudin pharetra sem id elementum. Sed tempor hendrerit orci. Ut scelerisque pretium diam, eu sodales ante sagittis ut. Phasellus id nunc commodo, sagittis urna vitae, auctor ex. Duis elit tellus, pharetra sed ipsum sit amet, bibendum dapibus mauris. Morbi condimentum laoreet justo, quis auctor leo rutrum eu. Sed id nibh non leo sodales pulvinar. Nam ornare ipsum nunc, eget molestie nulla ultrices vel. Curabitur fermentum nisl quis lorem aliquam pretium aliquam at mauris. In vestibulum, tellus et pharetra sollicitudin, mi lacus consectetur dolor, id volutpat nulla eros a mauris. ', 'Here we go!', TRUE);

--

Then, an advantageous view is one that gathers columns from Item, Car and UserProfile:

--

CREATE VIEW CarAndOwner AS
    SELECT C.CarId,
           I.Foo,
           I.Bar,
           I.Baz,
           C.Qux,
           C.Corge,           
           U.FirstName AS OwnerFirstName,
           U.LastName  AS OwnerLastName
        FROM Item I
        JOIN Car C
          ON C.CarId = I.ItemId
        JOIN UserProfile U
          ON U.UserId = I.OwnerId;

--

Naturally, a similar approach can be followed so that you can as well SELECT the “full” Boat and Plane information straight FROM one single table (a derived one, in these cases).

After that —if you do not mind about the presence of NULL marks in result sets— with the following VIEW definition, you can, e.g., “collect” columns from the tables Item, Car, Boat, Plane and UserProfile:

--

CREATE VIEW FullItemAndOwner AS
    SELECT I.ItemId,
           I.Foo, -- Common to all Categories.
           I.Bar, -- Common to all Categories.
           I.Baz, -- Common to all Categories.
          IC.Name      AS Category,
           C.Qux,    -- Applies to Cars only.
           C.Corge,  -- Applies to Cars only.
           --
           B.Grault, -- Applies to Boats only.
           B.Garply, -- Applies to Boats only.
           B.Plugh,  -- Applies to Boats only.
           --
           P.Xyzzy,  -- Applies to Planes only.
           P.Thud,   -- Applies to Planes only.
           P.Wibble, -- Applies to Planes only.
           P.Flob,   -- Applies to Planes only.
           U.FirstName AS OwnerFirstName,
           U.LastName  AS OwnerLastName
        FROM Item I
        JOIN Category IC
          ON I.CategoryCode = IC.CategoryCode
   LEFT JOIN Car C
          ON C.CarId = I.ItemId
   LEFT JOIN Boat B
          ON B.BoatId = I.ItemId
   LEFT JOIN Plane P
          ON P.PlaneId = I.ItemId               
        JOIN UserProfile U
          ON U.UserId = I.OwnerId;

--

The code of the views here shown is only illustrative. Of course, doing some testing exercises and modifications might help to accelerate the (physical) execution of the queries at hand. In addition, you might need to remove or add columns to said views as the business needs dictate.

The sample data and all the view definitions are incorporated into this db<>fiddle so that they can be observed “in action”.

Data manipulation: Application program(s) code and column aliases

The usage of application program(s) code (if that is what you mean by “server-side specific code”) and column aliases are other significant points that you brought up in the next comments:

I did manage to workaround [a JOIN] issue with server-side specific code, but I really don't want to do that -AND- adding aliases to all columns might be "stressing".

Very well explained, thank you very much. However, as I suspected, I'll have to manipulate the result set when listing all data because of the similarities with some columns, since I don't want to use several aliases to keep the statement cleaner.

It is opportune to indicate that while using application program code is a very fitting resource to handle the presentation or graphical features —i.e., the external level of representation of a computerized information system— of data sets, it is paramount that you avoid carrying out data retrieval on a row-by-row basis to prevent execution speed issues. The objective should be to “fetch” the pertinent data sets in toto by means of the robust data manipulation instruments provided by the (precisely) set engine of the SQL platform so that you can optimize the behaviour of your system.

Furthermore, utilizing aliases to rename one or more columns within a certain scope may appear stressing but, personally, I see such resource as a very powerful tool that helps to (i) contextualize and (ii) disambiguate the meaning and intention ascribed to the concerning columns; hence, this is an aspect that should be thoroughly pondered with respect to the manipulation of the data of interest.

Similar scenarios

You might as well find of help this series of posts and this group of posts which contain my take on two other cases that include supertype-subtype associations with mutually exclusive subtypes.

I have also proposed a solution for a business environment involving a supertype-subtype cluster where the subtypes are not mutually exclusive in this (newer) answer.

Endnotes

_{¹ Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the U.S. National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the theoretical works authored by the sole originator of the relational model, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.}

_{² In IDEF1X, a role name is a distinctive label assigned to a FK property (or attribute) in order to express the meaning that it holds within the scope of its respective entity type.}

_{³ The IDEF1X standard defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.}

Best Answer

Related Solutions

Schema design for storing item combination with order

MySQL – How to Model an Entity Type with Different Sets of Attributes