I would need to track a products price changes so that I can query the db for a product price at a given date. The information is used in a system that calculates historical audits so it must return the correct price for the correct product based on the date of purchase.
I would prefer to use postgres in building the db.
I need with the design of the database but any and all best practice suggestions are also welcome.
Best Answer
If I understand the scenario appropriately, you should define a table that retains a Price time series; therefore, I agree, this has a lot to do with the temporal aspect of the database you are working with.
Business rules
Let us start analyzing the situation from the conceptual level. So, if, in your business domain,
then that means that
The IDEF1X diagram shown in Figure 1, although highly simplified, depicts such a scenario:
Expository logical layout
And the following SQL-DDL logical-level design, based on said IDEF1X diagram, illustrates a feasible approach that you can adapt to your own exact needs:
The
Price
table has a composite PRIMARY KEY made up of two columns, i.e.,ProductNumber
(constrained, in turn, as a FOREIGN KEY that makes a reference toProduct.ProductNumber
) andStartDate
(pointing out the particular Date in which a certain Product was purchased at a specific Price).In case that Products are purchased at different Prices during the same Day, instead of the
StartDate
column, you may include one labeled asStartDateTime
that keeps the Instant when a given Product was purchased at an exact Price. The PRIMARY KEY would then have to be declared as(ProductNumber, StartDateTime)
.As demonstrated, the aforementioned table is an ordinary one, because you can declare SELECT, INSERT, UPDATE and DELETE operations to manipulate its data directly, hence it (a) permits avoiding the installation of additional components and (b) can be used in all the major SQL platforms with some few adjustments, if necessitated.
Data manipulation samples
To exemplify some manipulation operations that appear useful, let us say that you have INSERTed the following data in the
Product
andPrice
tables, respectively:Since the
Price.EndDate
is a derivable data point, then you must to obtain it via, precisely, a derived table that can be created as a view in order to produce the “full” time series, as exemplified below:Then the following operation that SELECTs directly from that view
supplies the next result set:
Now, let us assume that you are interested in getting the whole
Price
data for theProduct
primarily identified byProductNumber
1750 onDate
2 June 2017. Seeing that aPrice
assertion (or row) is current or effective during the entire Interval that runs from (i) itsStartDate
to (ii) itsEndDate
, then this DML operationyields the result set that follows
which addresses said requirement.
As shown, the
PriceWithEndDate
view plays a paramount role in obtaining most of the derivable data, and can be SELECTed FROM in a fairly ordinary way.Taking into account that your platform of preference is PostgreSQL, this content from the official documentation site contains information about “materialized” views, which can help to optimize the execution speed by means of physical level mechanisms, if said aspect becomes problematic. Other SQL database management systems (DBMSs) offer physical instruments that are very alike, although different terminology may be applied, e.g., “indexed” views in Microsoft SQL Server.
You can see the discussed DDL and DML code samples in action in this db<>fiddle and in this SQL Fiddle.
Related resources
In this Q & A we discuss a business context that includes the changes of Product Prices but has a more extense scope, so you may find it of interest.
These Stack Overflow posts cover very relevant points regarding the type of a column that holds a currency datum in PostgreSQL.
Responses to comments
The method I propose above addresses a business domain of the characteristics previously described, consequently applying your suggestion about declaring the
EndDate
as a column —which is different from a “field”— of the base table namedPrice
would imply that the logical structure of the database would not be reflecting the conceptual schema correctly, and a conceptual schema must be defined and reflected with precision, including the differentiation of (1) base information from (2) derivable information.Apart from that, such a course of action would introduce duplication, since the
EndDate
could then be obtained by virtue of (a) a derivable table and also by virtue of (b) the base table namedPrice
, with the therefore duplicatedEndDate
column. While that is a possibility, if a practitioner decides to follow said approach, he or she should decidedly warn the database users about the inconveniences and inefficiencies it involves. One of those inconveniences and inefficiencies is, e.g., the urgent need to develop a mechanism that ensures, at all times, that eachPrice.EndDate
value is equal to that of thePrice.StartDate
column of the immediately successive row for thePrice.ProductNumber
value at hand.In contrast, the work to produce the derived data in question as I put forward is, honestly, not special at all, and is required to (i) guarantee the correct correspondence between the logical and conceptual levels of abstraction of the database and (ii) ensure data integrity, both aspects that as noted before are decidedly of great importance.
If the efficiency aspect you are talking about is related to the execution speed of some data manipulation operations, then it must be managed at the appropriate place, i.e., at the physical level, via, e.g., an advantageous indexing strategy, based on (1) the particular query tendencies and (2) the particular physical mechanisms provided by the DBMS of use. Otherwise, sacrificing the appropriate conceptual-logical mapping and compromising the integrity of the data involved easily turns a robust system (i.e., a valuable organizational asset) into a non-reliable resource.
Discontinuous or disjunct time series
On the other hand, there are circumstances where retaining the
EndDate
of each row in a time series table is not only more conenient and efficient but demanded, although that depends entirely on business-environment-specific requirements of course. One example of that kind of circumstances comes about whenI have represented said scenario in the IDEF1X diagram displayed in Figure 2.
In that case, yes, the hypothetical
Price
table must be declared in a manner that is similar to this:And, yes, that logical DDL design simplifies administration at the physical level, because you can put up an indexing strategy that encompasses the
EndDate
column (which, as shown, is declared in a base table) in relatively easier configurations.Then, a SELECT operation like the one below
may be used to derive the whole
Price
data for theProduct
primarily identified byProductNumber
1750 onDate
2 June 2017.