I know this has been asked here and here, but I have the same idea with a different possible implementation and I need some help.
Initially I had my blogstories
table with this structure:
| Column | Type | Description |
|-----------|-------------|------------------------------------------------|
| uid | varchar(15) | 15 characters unique generated id |
| title | varchar(60) | story title |
| content | longtext | story content |
| author | varchar(10) | id of the user that originally wrote the story |
| timestamp | int | integer generated with microtime() |
After I decided I wanted to implement some versioning system for every story on the blog, the first thing that came to my mind was creating a different table to hold edits; after that, I thought I could modify the existing table to hold versions instead of edits. This is the structure that came to my mind:
| Column | Type | Description |
|------------ |------------- |------------------------------------------------ |
| story_id | varchar(15) | 15 characters unique generated id |
| version_id | varchar(5) | 5 characters unique generated id |
| editor_id | varchar(10) | id of the user that commited |
| author_id | varchar(10) | id of the user that originally wrote the story |
| timestamp | int | integer generated with microtime() |
| title | varchar(60) | current story title |
| content | longtext | current story text |
| coverimg | varchar(20) | cover image name |
The reasons why I came here:
- The
uid
field of the initial table was UNIQUE in the table. Now, thestory_id
is not unique anymore. How should I deal with that? (I thought I could addressstory_id = x
and then find the latest version, but that seems very resource consuming, so please give your advice) author_id
field value is repeating in each every row of the table. Where and how should I keep it?
Edit
The unique codes generation process is in the CreateUniqueCode
function:
trait UIDFactory {
public function CryptoRand(int $min, int $max): int {
$range = $max - $min;
if ($range < 1) return $min;
$log = ceil(log($range, 2));
$bytes = (int) ($log / 8) + 1;
$bits = (int) $log + 1;
$filter = (int) (1 << $bits) - 1;
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter;
} while ($rnd >= $range);
return $min + $rnd;
}
public function CreateUID(int $length): string {
$token = "";
$codeAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$codeAlphabet.= "abcdefghijklmnopqrstuvwxyz";
$codeAlphabet.= "0123456789";
$max = strlen($codeAlphabet) - 1;
for ($i=0; $i < $length; $i++) {
$token .= $codeAlphabet[$this->CryptoRand(0, $max)];
}
return $token;
}
}
The code is written in Hack, and was originally written in PHP by @Scott in his answer.
The fields author_id
and editor_id
can be different, because there are users with enough permissions to edit anyone's stories.
Best Answer
Analyzing the scenario —which presents characteristics associated with the subject known as temporal databases— from a conceptual perspective, one can determine that: (a) a “present” Blog Story Version and (b) a “past” Blog Story Version, although very resembling, are entities of different types.
In addition to that, when working at the logical level of abstraction, facts (represented by rows) of distinct kinds must be retained in distinct tables. In the case under consideration, even when quite similar, (i) facts about “present” Versions are different from (ii) facts about “past” Versions.
Therefore I recommend managing the situation by means of two tables:
one dedicated exclusively for the “current” or “present” Versions of the Blog Stories, and
one that is separate, but also linked with the other, for all the “previous” or “past” Versions;
each with (1) a slightly distinct number of columns and (2) a different group of constraints.
Back to the conceptual layer, I consider that —in your business environment— Author and Editor are notions that can be delineated as Roles that can be played by a User, and these important aspects depend on data derivation (via logical-level manipulation operations) and interpretation (carried out by the Blog Stories readers and writers, at the external level of the computerized information system, with the assistance of one or more application programs).
I will detail all these factors and other relevant points as follows.
Business rules
According to my understanding of your requirements, the following business rules formulations (put together in terms of the relevant entity types and their kinds of interrelationships) are specially helpful in establishing the corresponding conceptual schema:
Expository IDEF1X diagram
Consequently, in order to expound my suggestion by virtue of a graphical device, I have created a sample IDEF1Xa diagram that is derived from the business rules formulated above and other features that seem pertinent. It is shown in Figure 1:
Why are BlogStory and BlogStoryVersion conceptualized as two different entity types?
Because:
A BlogStoryVersion instance (i.e., a “past” one) always holds a value for an UpdatedDateTime property, while a BlogStory occurrence (i.e., a “present” one) never holds it.
Besides, the entities of those types are uniquely identified by the values of two distinct sets of properties: BlogStoryNumber (in the case of the BlogStory occurrences), and BlogStoryNumber plus CreatedDateTime (in the case of the BlogStoryVersion instances).
a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is based on the early theoretical material authored by the sole originator of the relational model, i.e., Dr. E. F. Codd; on the Entity-Relationship view of data, developed by Dr. P. P. Chen; and also on the Logical Database Design Technique, created by Robert G. Brown.
Illustrative logical SQL-DDL layout
Then, based on the conceptual analysis previously presented, I declared the logical-level design below:
Tested in this SQL Fiddle that runs on MySQL 5.6.
The
BlogStory
tableAs you can see in the demo design, I have defined the
BlogStory
PRIMARY KEY (PK for brevity) column with the INT datatype. In this regard, you may like to fix a built-in automatic process that generates and assigns a numeric value for such a column in every row insertion. If you do not mind leaving gaps occasionally in this set of values, then you can employ the AUTO_INCREMENT attribute, commonly used in MySQL environments.When entering all your individual
BlogStory.CreatedDateTime
data points, you can utilize the NOW() function, which returns the Date and Time values that are current in the database server at the exact INSERT operation instant. To me, this practice is decidedly more suitable and less prone to errors than the use of external routines.Provided that, as discussed in (now-removed) comments, you want to avoid the possibility of maintaining
BlogStory.Title
duplicate values, you have to set up a UNIQUE constraint for this column. Due to the fact that a given Title may be shared by several (or even all of the) “past” BlogStoryVersions, then a UNIQUE constraint should not be established for theBlogStoryVersion.Title
column.I included the
BlogStory.IsActive
column of type BIT(1) (though a TINYINT may as well be used) in case you need to provide “soft” or “logical” DELETE functionality.Details about the
BlogStoryVersion
tableOn the other hand, the PK of the
BlogStoryVersion
table is composed of (a)BlogStoryNumber
and (b) a column namedCreatedDateTime
that, of course, marks the precise instant in which aBlogStory
row underwent an INSERT.BlogStoryVersion.BlogStoryNumber
, besides being part of the PK, is also constrained as a FOREIGN KEY (FK) that referencesBlogStory.BlogStoryNumber
, a configuration that enforces referential integrity between the rows of these two tables. In this respect, implementing an automatic generation of aBlogStoryVersion.BlogStoryNumber
is not necessary because, being set as a FK, the values INSERTed into this column must be “drawn from” the ones already enclosed in the relatedBlogStory.BlogStoryNumber
counterpart.The
BlogStoryVersion.UpdatedDateTime
column should retain, as expected, the point in time when aBlogStory
row was modified and, as a consequence, added to theBlogStoryVersion
table. Hence, you can use the NOW() function in this situation too.The Interval comprehended between
BlogStoryVersion.CreatedDateTime
andBlogStoryVersion.UpdatedDateTime
expresses the entire Period during which aBlogStory
row was “present” or “current”.Considerations for a
Version
columnIt can be useful to think of
BlogStoryVersion.CreatedDateTime
as the column that holds the value that represents a particular “past” Version of a BlogStory. I deem this much more beneficial than aVersionId
orVersionCode
, since it is user-friendlier in the sense that people tend to be more familiar with time concepts. For instance, the blog authors or readers could refer to a BlogStoryVersion in a fashion similar to the following:1750
that was Created on26 August 2015
at9:30
”.The Author and Editor Roles: Data derivation and interpretation
With this approach, you can easily distinguish who holds the “original”
AuthorId
of a concrete BlogStory SELECTing the “earliest” Version of a certainBlogStoryId
FROM theBlogStoryVersion
table by virtue of applying the MIN() function toBlogStoryVersion.CreatedDateTime
.In this way, each
BlogStoryVersion.AuthorId
value contained in all the “later” or “succeeding” Versions rows indicate, naturally, the Author identifier of the respective Version at hand, but one can also say that such a value is, at the same time, denoting the Role played by the involved User as Editor of the “original” Version of a BlogStory.Yes, a given
AuthorId
value may be shared by multipleBlogStoryVersion
rows, but this is actually a piece of information that tells something very significant about each Version, so the repetition of said datum is not a problem.The format of DATETIME columns
As for the DATETIME data type, yes, you are right, “MySQL retrieves and displays DATETIME values in '
YYYY-MM-DD HH:MM:SS
' format”, but you can confidently enter the pertinent data in this manner, and when you have to perform a query you just have to make use of the built-in DATE and TIME functions in order to, among other things, show the concerning values in the appropriate format for your users. Or you could certainly carry out this kind of data formatting via your application programms(s) code.Implications of
BlogStory
UPDATE operationsEvery time that a
BlogStory
row suffers an UPDATE, you must ensure that the corresponding values that were “present” until the modification took place are then INSERTed into theBlogStoryVersion
table. Thus, I highly suggest fulfilling these operations within a single ACID TRANSACTION to guarantee that they are treated as an indivisible Unit of Work. You may as well employ TRIGGERS, but they tend to make things untidy, so to speak.Introducing a
VersionId
orVersionCode
columnIf you opt (because of business circumstances or personal preference) to incorporate a
BlogStory.VersionId
orBlogStory.VersionCode
column to distinguish the BlogStoryVersions, you should ponder the following possibilities:A
VersionCode
could be required to be UNIQUE in (i) the wholeBlogStory
table and also in (ii)BlogStoryVersion
.Therefore, you have to implement a carefully tested and totally reliable method in order to generate and assign each
Code
value.Maybe, the
VersionCode
values could be repeated in differentBlogStory
rows, but never duplicated along with the sameBlogStoryNumber
. E.g., you could have:3
- Version83o7c5c
and, simultaneously,86
- Version83o7c5c
and958
- Version83o7c5c
.The later possibility opens another alternative:
Keeping a
VersionNumber
for theBlogStories
, so there could be:23
- Versions1, 2, 3…
;650
- Versions1, 2, 3…
;2254
- Versions1, 2, 3…
;Holding “original” and “subsequent” versions in a single table
Although maintaining all the BlogStoryVersions in the same individual base table is possible, I suggest not to do it because you would be mixing two distinct (conceptual) types of facts, which thus has undesirable side-effects on
But, on condition that you choose to follow that course of action, you can still take advantage of many of the ideas detailed above, e.g.:
BlogStoryNumber
) and a DATETIME column (CreatedDateTime
);Seeing that, by proceeding with such an approach, a
BlogStoryNumber
value will be duplicated as soon as “newer” Versions are added, an option that and that you could evaluate (which is very alike to those mentioned in the previous section) is establishing aBlogStory
PK composed of the columnsBlogStoryNumber
andVersionCode
, in this manner you would be able to uniquely identify each Version of a BlogStory. And you can try with a combination ofBlogStoryNumber
andVersionNumber
too.Similar scenario
You may find my answer to this question of help, since I as well propose enabling temporal capabilities in the concerning database to deal with a comparable scenario.