Should Multiple Tables Be Created for Different Entity States in MySQL?

database-designMySQL

I have a tasks table in my database and, in the business domain of interest, tasks can have multiple states: “open”, “begun”, “in review” and “completed”.

Despite having “open” and “begun” in the same table, identified using flags, I decided for some reason early on to have “in review” and “completed” tasks in their own separate tables. While this seemed like a good idea to start off with, making it easy to query for specific things, when I thought about implementing a comment feature for tasks I realised that the task_id value would be changing across the three tables.

Have I gone about this all wrong?

Current considerations

Some solutions I thought up would be to redo the organisation and put everything in one table, using flags to differentiate purely between the states (this would lead to a fair amount of reworking); to create some kind of UUID that is transferable across the tables – though I think there might issues with performance if this is done; and finally, to set the IDs across the tables to be no-longer auto incremented and to simply “inherit” the original value from the first task table (which auto increments).

What's the done thing in this situation? I'm not the most versed in database design and I'm kind of making this up as I go along.

Any help would be greatly appreciated.

Responses to further information requests made by @MDCCL via comments

So, the state (or status) of a task can change over time, right? How many times can the same concrete task present the same state, let us say, “open”? Will it vary depending on the kind of state presented by the relevant task?

Yes, a task.state could theoretically change up to four times. I won't go into specifics of the logic but the longest chain would be “begun” → “completed” → “review” → “completed”. And each time, currently, they're being put into another table.

The exception is whether it's “begun” or not, a user may abandon and then re-undertake the task infinitely – in theory at least. Which is at least helped by the fact that the “begun” status is a flag.

Ok, but a certain task will present, at all times, exactly one particular “present” state, right? For example, when a task row is INSERTed, it should be accompannied by a certain state value, which I assume would be “open”. Later, the state of such a task will evolve to only one of the other possible options, that is, either “begun” or “in review” or “completed”.

Yea, that's exactly right. A task will always be in a stage, and can't be in multiple ones.

Once a particular task is deemed as “completed”, can it change to another state?

Yes, it can be sent back to “review” if certain parameters are reached. And from “review” it can either go back to “begun” or back to “completed”.

Best Answer

The way I see it, the fact that (a) a concrete Task can present distinct States at different points in time entails (b) the creation of a table that is going to contain a time series, as I will detail below.

Business rules

In order to define the structure and constraints of a relational database, it is paramount to first identify and formulate the corresponding business domain rules (i.e., delineate the conceptual schema) with precision, taking into account the relevant entity types, properties and the corresponding interconnections. In this way, some of the formulations that are especially significant in the scenario under consideration are:

  • A Task presents one-to-many StateSpecifications
  • A Task cannot present more than one StateSpecification at the same Instant
  • A StateSpecification is considered Current
    • from the particular Instant in which it is entered
    • until the exact Instant when the successive StateSpecification is entered
  • A State defines zero-one-or-many StateSpecifications
  • A Task receives zero-one-or-many Comments

Consequently, based on the aforementioned formulations, I created the IDEF1X diagram shown in Figure 1:

Figure 1 - Tasks States and Comments IDEF1X Diagram

As you can see, Task and State are depicted as individual entity types, each with its own group of properties (or attributes) and its own (direct) associations (or relationships), which are expressed via the corresponding verb phrases, lines, cardinalities and FOREIGN KEY (FK) marks.

The entity type denominated StateSpecification, associated with the two brought up before, is a central aspect in the solution I will explain below.

Expository logical design and sample data

I shaped an expository logical design, based on the IDEF1X diagram above, by means of the DDL structure that follows:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define the most
-- convenient physical implementation settings; e.g.,
-- a good indexing strategy based on query tendencies.

-- As one would expect, you are free to use your 
-- preferred or required naming conventions. 

CREATE TABLE Task (
    TaskNumber      INT      NOT NULL,
    Description     CHAR(90) NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Task_PK  PRIMARY KEY (TaskNumber)
);    

CREATE TABLE State (
    StateCode   CHAR(1)  NOT NULL,
    Name        CHAR(30) NOT NULL,
    Description CHAR(60) NOT NULL,
    --
    CONSTRAINT State_PK  PRIMARY KEY (StateCode),
    CONSTRAINT State_AK1 UNIQUE      (Name),       -- ALTERNATE KEY.
    CONSTRAINT State_AK2 UNIQUE      (Description) -- ALTERNATE KEY.
); 

CREATE TABLE StateSpecification (
    TaskNumber        INT      NOT NULL,
    SpecifiedDateTime DATETIME NOT NULL,
    StateCode         CHAR(1)  NOT NULL,
    --
    CONSTRAINT StateSpecification_PK         PRIMARY KEY (TaskNumber, SpecifiedDateTime), -- Composite PRIMARY KEY.
    CONSTRAINT StateSpecification_to_Task_FK FOREIGN KEY (TaskNumber)
        REFERENCES Task  (TaskNumber),
    CONSTRAINT StateSpecification_to_State   FOREIGN KEY (StateCode)
        REFERENCES State (StateCode)    
);

CREATE TABLE TaskComment (
    TaskNumber      INT      NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    Body            TEXT     NOT NULL,
    --
    CONSTRAINT TaskComment_PK      PRIMARY KEY (TaskNumber, CreatedDateTime),
    CONSTRAINT TaskComment_to_Task FOREIGN KEY (TaskNumber)
        REFERENCES Task (TaskNumber)  
);

--
--

The State table

The State table previously illustrated fulfills a look-up role, and is meant to contain five rows (as per the five State samples you provided):

 +-—————————-+-—————————-+-———————————————-+
 | StateCode | Name      | Description     |
 +-—————————-+-—————————-+-———————————————-+
 | O         | Open      | Indicates that… |
 +-----------+-----------+-----------------+
 | B         | Begun     | Indicates that… |
 +-----------+-----------+-----------------+
 | R         | In review | Indicates that… |
 +-----------+-----------+-----------------+
 | C         | Completed | Indicates that… |
 +-----------+-----------+-----------------+

Note the definition of a PRIMARY KEY (PK) constraint on a column retaining values that are, at the same time, meaninfgul (regarding interpretation by end users and technicians) and small (in terms of bytes at the physical implementation level), which makes it both readable and fast regarding, e.g., data retrieval.

If new business domain States arise, you can of course INSERT the corresponding rows INTO the State table.

The State Specification table

Now, suposse that Task no. 1750 presents the following State History in the form of (unordered) rows held in the StateSpecification table:

 +-——————————-+-———————————————————————-+-—————————-+
 | TaskNumber | SpecifiedDateTime       | StateCode |
 +-——————————-+-———————————————————————-+-—————————-+
 | 1750       | 2016-12-01 16:58:12.000 | O         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-02 09:12:05.000 | B         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-04 10:57:01.000 | R         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-07 07:33:08.000 | C         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-08 12:12:09.000 | R         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-08 19:46:01.000 | B         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-09 06:24:07.000 | R         |
 +------------+-------------------------+-----------+
 | 1750       | 2016-12-11 07:24:07.000 | C         |
 +------------+-------------------------+-----------+

As demonstrated, each value in the SpecifiedDateTime column indicates the precise point in time in which a given Task (via the TaskNumber) started presenting a particular State (by virtue of StateCode).

Seeing that the StateCode column keeps values that have a quite clear intention, it helps to save the usage of a JOIN clause. It is paramount to make clear that this does not imply that JOINs are something detrimental (they are, in fact, fundamental and powerful instruments in any relational database) but, whenever suitable, reducing the consumption of the valuable resources of the system is decidedly practical.

The Comment table

The Comment table, as exemplified above, can easily be related to the Task table via a FK constraint definition, because the PK value of a Task will remain the same, no matter if its State undergoes zero, one or more modifications.

I assume that, in the business context of relevance, a Comment (a) will always depend on the existence of a Task occurrence and (b) will always belong to the same indidivual Task occurrence; therefore, I defined the Comment table with a composite PK that incorporates the PK column of Task and the exact DateTime in which a particular Comment was created. Apart from that, you might like to evaluate the attachment of a column for system-controled surrogates (e.g., a column with the AUTO_INCREMENT attribute in MySQL) to this table, it may or may not offer a better execution performance (probably not), so carrying out some thorough testing sessions would clarify the case (a non-data addition, as is the case of columns holding system-controled surrogates, should always be well assessed and justified).

Data derivation code samples

There are some important data points that will very likely be required with respect to the scenario in question, but they should be obtained as derived (or calculated) values.

For example, if you need to obtain the CurrentState for Task no. 1750, you can derive it with the next SELECT statement:

SELECT T.TaskNumber,
       SS.StateCode AS CurrentStateCode,
       SS.SpecifiedDateTime
     FROM Task T
     JOIN StateSpecification SS
       ON T.TaskNumber = SS.TaskNumber 
    WHERE T.TaskNumber = 1750 -- You can provide a parameter instead of a fixed value.
      AND SS.SpecifiedDateTime = (
                                    SELECT MAX(SpecifiedDateTime)
                                          FROM StateSpecification InnerSS
                                         WHERE T.TaskNumber = InnerSS.TaskNumber
                                 );

When you face the necessity to get the CurrentState for all Tasks, you can declare a statement like the one that follows:

SELECT T.TaskNumber,
       SS.StateCode AS CurrentStateCode,
       SS.SpecifiedDateTime
     FROM Task T
     JOIN StateSpecification SS
       ON T.TaskNumber = SS.TaskNumber
    WHERE SS.SpecifiedDateTime = (
                                    SELECT MAX(SpecifiedDateTime)
                                          FROM StateSpecification InnerSS
                                         WHERE T.TaskNumber = InnerSS.TaskNumber
                                 );

If you have to derive the StartDate and EndDate of all the StateSpecifications that correspond to Task no. 1750, you may utilize:

SELECT T.TaskNumber,
       T.Description,
       SS.StateCode,
       SS.SpecifiedDateTime AS StartDateTime,
       (
          SELECT MIN(SpecifiedDateTime)
                FROM StateSpecification InnerSS
               WHERE T.TaskNumber = InnerSS.TaskNumber
                 AND InnerSS.SpecifiedDateTime > SS.SpecifiedDateTime
       ) AS EndDateTime
    FROM Task T
    JOIN StateSpecification SS
      ON T.TaskNumber = SS.TaskNumber
   WHERE T.TaskNumber = 1750 -- You can provide a parameter instead of a fixed value.
ORDER BY StartDateTime DESC;

In case you must obtain the StartDate and EndDate of all the StateSpecifications, i.e., the whole period in which they were current:

SELECT T.TaskNumber,
       T.Description,
       SS.StateCode,
       SS.SpecifiedDateTime AS StartDateTime,
       (
          SELECT MIN(SpecifiedDateTime)
                FROM StateSpecification InnerSS
               WHERE T.TaskNumber = InnerSS.TaskNumber
                 AND InnerSS.SpecifiedDateTime > SS.SpecifiedDateTime
       ) AS EndDateTime
    FROM Task T
    JOIN StateSpecification SS
      ON T.TaskNumber = SS.TaskNumber
ORDER BY StartDateTime DESC;

VIEW definitions

Naturally, one or more of the SELECT statements displayed above can be set up as VIEWs, so that you can arrive at, e.g., the pertinent information directly FROM one single table (a derived one, yes, but it would still be a table).

Demos

I created live demos of the code here deliberated in this SQL Fiddle (running on MySQL 5.6) and in this db<>fiddle (running on Microsoft SQL Server 2014, if someone is interested).

Responses to comments

I think I shall have to go about performing this modification, and think the system will benefit from the additional functionality of the StateSpecification table. It goes to show that good database design can accommodate all our needs without making things messy.

I could not agree more with your second assertion. A database that is designed properly (conceptually, logically and physically), usually becomes a key element in a software development project. If a relational database reflects the business context of interest with accuracy, it can —to be brief— ease its maintenance substantially and offer vast versatility. That is why it is very useful to analyze carefully the structure of the relevant things, to observe them in terms of what they are in the real world, and then represent them with precision in a database implementation.

The data, in and of itself, is a highly valuable organizational asset and, a consequence, it should be administered as such. A reliable manner to achieve said objective is employing technical means that are supported on sound theory, and in the field of data management there is nothing more sound than the relational model. This is in agreement with the Dr. E. F. Codd’s 1981 Turing Award Lecture, entitled Relational Database: A Practical Foundation for Productivity.

Also, just to clarify, the StateSpecification table has to be updated manually on every change of state, correct?

Basically yes, the StateSpecification table has to be brought up to date each time that a Task presents a State change. This procedure should be performed by way of an INSERT operation that introduces a new StateSpecification row that should include a FK value that points to the Task row that applies.

It is opportune to stress that when a Task row is INSERTed, it will have to present a StateSpecification indicating the State deemed as ‘Open’ (by dint of an ‘O’ value retained in the StateCode column), hence you should make use of well defined ACID TRANSACTIONS so that both the Task row and its StateSpecification counterpart are treated as a single unit. In this manner, the two respective INSERT operations can either succeed or fail as a whole, maintaining the involved data consistent with the identified business rules.

Comparable scenario

You might find of help

as it includes a case that has similar features.


Endnotes

Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) the early theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.