I have a tasks
table in my database and, in the business domain of interest, tasks can have multiple states: “open”, “begun”, “in review” and “completed”.
Despite having “open” and “begun” in the same table, identified using flags, I decided for some reason early on to have “in review” and “completed” tasks in their own separate tables. While this seemed like a good idea to start off with, making it easy to query for specific things, when I thought about implementing a comment feature for tasks I realised that the task_id
value would be changing across the three tables.
Have I gone about this all wrong?
Current considerations
Some solutions I thought up would be to redo the organisation and put everything in one table, using flags to differentiate purely between the states (this would lead to a fair amount of reworking); to create some kind of UUID
that is transferable across the tables – though I think there might issues with performance if this is done; and finally, to set the IDs
across the tables to be no-longer auto incremented and to simply “inherit” the original value from the first task
table (which auto increments).
What's the done thing in this situation? I'm not the most versed in database design and I'm kind of making this up as I go along.
Any help would be greatly appreciated.
Responses to further information requests made by @MDCCL via comments
So, the state (or status) of a task can change over time, right? How many times can the same concrete task present the same state, let us say, “open”? Will it vary depending on the kind of state presented by the relevant task?
Yes, a task.state could theoretically change up to four times. I won't go into specifics of the logic but the longest chain would be “begun” → “completed” → “review” → “completed”
. And each time, currently, they're being put into another table.
The exception is whether it's “begun” or not, a user may abandon and then re-undertake the task infinitely – in theory at least. Which is at least helped by the fact that the “begun” status is a flag.
Ok, but a certain task will present, at all times, exactly one particular “present” state, right? For example, when a
task
row is INSERTed, it should be accompannied by a certainstate
value, which I assume would be “open”. Later, the state of such a task will evolve to only one of the other possible options, that is, either “begun” or “in review” or “completed”.
Yea, that's exactly right. A task will always be in a stage, and can't be in multiple ones.
Once a particular task is deemed as “completed”, can it change to another state?
Yes, it can be sent back to “review” if certain parameters are reached. And from “review” it can either go back to “begun” or back to “completed”.
Best Answer
The way I see it, the fact that (a) a concrete Task can present distinct States at different points in time entails (b) the creation of a table that is going to contain a time series, as I will detail below.
Business rules
In order to define the structure and constraints of a relational database, it is paramount to first identify and formulate the corresponding business domain rules (i.e., delineate the conceptual schema) with precision, taking into account the relevant entity types, properties and the corresponding interconnections. In this way, some of the formulations that are especially significant in the scenario under consideration are:
Consequently, based on the aforementioned formulations, I created the IDEF1X† diagram shown in Figure 1:
As you can see,
Task
andState
are depicted as individual entity types, each with its own group of properties (or attributes) and its own (direct) associations (or relationships), which are expressed via the corresponding verb phrases, lines, cardinalities and FOREIGN KEY (FK) marks.The entity type denominated
StateSpecification
, associated with the two brought up before, is a central aspect in the solution I will explain below.Expository logical design and sample data
I shaped an expository logical design, based on the IDEF1X diagram above, by means of the DDL structure that follows:
The State table
The
State
table previously illustrated fulfills a look-up role, and is meant to contain five rows (as per the five State samples you provided):Note the definition of a PRIMARY KEY (PK) constraint on a column retaining values that are, at the same time, meaninfgul (regarding interpretation by end users and technicians) and small (in terms of bytes at the physical implementation level), which makes it both readable and fast regarding, e.g., data retrieval.
If new business domain States arise, you can of course INSERT the corresponding rows INTO the
State
table.The State Specification table
Now, suposse that Task no. 1750 presents the following State History in the form of (unordered) rows held in the
StateSpecification
table:As demonstrated, each value in the
SpecifiedDateTime
column indicates the precise point in time in which a given Task (via theTaskNumber
) started presenting a particular State (by virtue ofStateCode
).Seeing that the
StateCode
column keeps values that have a quite clear intention, it helps to save the usage of a JOIN clause. It is paramount to make clear that this does not imply that JOINs are something detrimental (they are, in fact, fundamental and powerful instruments in any relational database) but, whenever suitable, reducing the consumption of the valuable resources of the system is decidedly practical.The Comment table
The
Comment
table, as exemplified above, can easily be related to theTask
table via a FK constraint definition, because the PK value of aTask
will remain the same, no matter if its State undergoes zero, one or more modifications.I assume that, in the business context of relevance, a Comment (a) will always depend on the existence of a Task occurrence and (b) will always belong to the same indidivual Task occurrence; therefore, I defined the
Comment
table with a composite PK that incorporates the PK column ofTask
and the exactDateTime
in which a particularComment
was created. Apart from that, you might like to evaluate the attachment of a column for system-controled surrogates (e.g., a column with the AUTO_INCREMENT attribute in MySQL) to this table, it may or may not offer a better execution performance (probably not), so carrying out some thorough testing sessions would clarify the case (a non-data addition, as is the case of columns holding system-controled surrogates, should always be well assessed and justified).Data derivation code samples
There are some important data points that will very likely be required with respect to the scenario in question, but they should be obtained as derived (or calculated) values.
For example, if you need to obtain the
CurrentState
for Task no. 1750, you can derive it with the next SELECT statement:When you face the necessity to get the
CurrentState
for all Tasks, you can declare a statement like the one that follows:If you have to derive the
StartDate
andEndDate
of all theStateSpecifications
that correspond to Task no. 1750, you may utilize:In case you must obtain the
StartDate
andEndDate
of all theStateSpecifications
, i.e., the whole period in which they were current:VIEW definitions
Naturally, one or more of the SELECT statements displayed above can be set up as VIEWs, so that you can arrive at, e.g., the pertinent information directly FROM one single table (a derived one, yes, but it would still be a table).
Demos
I created live demos of the code here deliberated in this SQL Fiddle (running on MySQL 5.6) and in this db<>fiddle (running on Microsoft SQL Server 2014, if someone is interested).
Responses to comments
I could not agree more with your second assertion. A database that is designed properly (conceptually, logically and physically), usually becomes a key element in a software development project. If a relational database reflects the business context of interest with accuracy, it can —to be brief— ease its maintenance substantially and offer vast versatility. That is why it is very useful to analyze carefully the structure of the relevant things, to observe them in terms of what they are in the real world, and then represent them with precision in a database implementation.
The data, in and of itself, is a highly valuable organizational asset and, a consequence, it should be administered as such. A reliable manner to achieve said objective is employing technical means that are supported on sound theory, and in the field of data management there is nothing more sound than the relational model. This is in agreement with the Dr. E. F. Codd’s 1981 Turing Award Lecture, entitled Relational Database: A Practical Foundation for Productivity.
Basically yes, the
StateSpecification
table has to be brought up to date each time that a Task presents a State change. This procedure should be performed by way of an INSERT operation that introduces a newStateSpecification
row that should include a FK value that points to theTask
row that applies.It is opportune to stress that when a
Task
row is INSERTed, it will have to present aStateSpecification
indicating the State deemed as ‘Open’ (by dint of an ‘O’ value retained in theStateCode
column), hence you should make use of well defined ACID TRANSACTIONS so that both theTask
row and itsStateSpecification
counterpart are treated as a single unit. In this manner, the two respective INSERT operations can either succeed or fail as a whole, maintaining the involved data consistent with the identified business rules.Comparable scenario
You might find of help
as it includes a case that has similar features.
Endnotes
† Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) the early theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.