Postgresql – How to check if there are no time gaps in PostgreSQL

constraintdatabase-designintervalpostgresql

Schema:

CREATE TABLE "expenses_commissionrule" (
    "id" serial NOT NULL PRIMARY KEY, 
    "country" varchar(2) NOT NULL, 
    "created" timestamp with time zone NOT NULL, 
    "modified" timestamp with time zone NOT NULL, 
    "activity" tsrange NOT NULL
); 

Description:

I'd like to create an application to manage commission calculations. Every rule has activity period. Each country has independent set of rules.

Constraints:

First. To avoid ambiguity these activity periods should not overlap. I did it with the following constraint:

ALTER TABLE expenses_commissionrule 
ADD CONSTRAINT non_overlapping_activity 
EXCLUDE USING GIST (country WITH =, activity WITH &&);

Second. In any point of time there should be only one commission rule, so, there should be no gaps between intervals in the table. In other words sum of all intervals should be -INF:+INF.

Question:
How can I add the second constraint?
Use case:
We have a rule with infinite period. I want to switch to a new rule, which should start from the next month. In this case I'd like to set current rule end period to the end of the current month and add a new rule in a single action.

Update:
In the future I'd like to add the following behaviour:

  1. An ability to specify a rule only for specific user (via nullable foreign key "user_id");
  2. Every user could have his own rule for each country. If user has no rule – "global" rules will be used for commission calculations, so it is like a fallback;
  3. These "user" rules could have any period of activity – (-inf: inf) and concrete intervals as well. It is different from "global" rules – gaps are OK here.
  4. As "may be" feature – extra varchar field, which will have a type of situation, when particular rule could be applied to have more flexibility with the calculation process. This "partial" rules might have different intervals as well, gaps are OK.

In other words all previous constraints should be applied only to rows where user_id IS NULL. How can this be accomplished?

PostgreSQL version: 9.6.2

Best Answer

Chapter 1: Linked List

One way to have this (no gaps) type of constraint enforced in the database is to split activity into the starting and ending parts and then use UNIQUE and FOREIGN KEY constraints to emulate a linked list.

1a.

  • Every activity_start should reference the previous activity_end:
    (country, activity_start) REFERENCES (country, activity_end).
  • Two periods cannot have the same activity_start and country or activity end and country:
    UNIQUE constraint on (country, activity_start).
    and UNIQUE (country, activity_end).
  • We actually don't need both of them, only the second one, for the foreign key to be defined. The exclusion constraint doesn't allow two periods to have the same start or end.
  • We should not allow multiple rows with (-Infinity, +Infinity) or series like: -Infinity -> DateA -> Infinity -> DateB -> +Infinity. This is achieved with the two partial indexes.

Code:

CREATE TABLE expenses_commissionrule (
    id serial NOT NULL PRIMARY KEY,
    country varchar(2) NOT NULL,
    created timestamp with time zone NOT NULL,
    modified timestamp with time zone NOT NULL,
    activity tsrange NOT NULL,
    activity_start timestamp,
    activity_end timestamp,
    CONSTRAINT non_overlapping_activity
        EXCLUDE USING GIST (country WITH =, activity WITH &&),
    CONSTRAINT country_activity_end_uq
        UNIQUE (country, activity_end),
    CONSTRAINT activity_start_end_fk
        FOREIGN KEY (country, activity_start)
        REFERENCES expenses_commissionrule (country, activity_end),
    CONSTRAINT activity_ck
        CHECK (activity IS NOT DISTINCT FROM 
               tsrange(activity_start, activity_end, '[)') )
) ;


CREATE UNIQUE INDEX country_start_ufx
    ON expenses_commissionrule
        (country)
    WHERE (activity_start IS NULL) ;

CREATE UNIQUE INDEX country_end_ufx
    ON expenses_commissionrule
        (country)
    WHERE (activity_end IS NULL) ;

We can then try to insert (valid) data:

WITH ins
    (country, activity_start, activity_end)
  AS
    ( VALUES
          ('IT',  null::timestamp,  null::timestamp),

          ('FR',  null,             '2000-01-01'),
          ('FR',  '2000-01-01',     null),

          ('GR',  null,             '2000-01-01'),
          ('GR',  '2000-01-01',     '2012-01-01'),
          ('GR',  '2012-01-01',     '2017-06-01'),
          ('GR',  '2017-06-01',     null)
    )
INSERT INTO expenses_commissionrule
    (country, created, modified, activity, activity_start, activity_end)
SELECT
    country, now(), now(),
    tsrange(activity_start, activity_end, '[)'),
    activity_start, activity_end
FROM ins ;

Works fine:

> INSERT 0 7

And try with invalid data:

--
( VALUES
      ('US',  null::timestamp,  '2000-01-01'::timestamp)
)
--

-- Fails:

> ERROR:  insert or update on table "expenses_commissionrule" violates 
      foreign key constraint "activity_start_end_fk"
> DETAIL:  Key (country, activity_end)=(US, 2000-01-01 00:00:00) is not
      present in table "expenses_commissionrule".

Another try:

( VALUES
      ('UK',  null::timestamp,  '2000-01-01'::timestamp),
      ('UK',  '2000-01-01',     '2000-01-01')
)

-- Fails:

> ERROR:  duplicate key value violates unique constraint 
      "country_activity_end_uq"
> DETAIL:  Key (country, activity_end)=(UK, 2000-01-01 00:00:00) 
      already exists.

1b.

After all these, we can spot that activity is not really needed in the table as we have start and end and we can compute it. Thus it can be removed:

CREATE TABLE expenses_commissionrule (
    id serial NOT NULL PRIMARY KEY,
    country varchar(2) NOT NULL,
    created timestamp with time zone NOT NULL,
    modified timestamp with time zone NOT NULL,
    activity_start timestamp,
    activity_end timestamp,
    CONSTRAINT non_overlapping_activity
        EXCLUDE USING GIST 
            (country WITH =, 
             tsrange(activity_start, activity_end, '[)') WITH &&),
    CONSTRAINT country_activity_end_uq
        UNIQUE (country, activity_end),
    CONSTRAINT activity_start_end_fk
        FOREIGN KEY (country, activity_start)
        REFERENCES expenses_commissionrule (country, activity_end)
) ;

-- plus the two filtered indexes. We do need those.

1c.

And then we realize that - due to the foreign key we added - we don't really need the exclusion constraint any more. We can have the same effect by enforcing that activity_end is after activity_start.

CREATE TABLE expenses_commissionrule (
    id serial NOT NULL PRIMARY KEY,
    country varchar(2) NOT NULL,
    created timestamp with time zone NOT NULL,
    modified timestamp with time zone NOT NULL,
    activity_start timestamp,
    activity_end timestamp,
    CONSTRAINT non_overlapping_activity
        CHECK (activity_start < activity_end),
    CONSTRAINT country_activity_end_uq
        UNIQUE (country, activity_end),
    CONSTRAINT activity_start_end_fk
        FOREIGN KEY (country, activity_start)
        REFERENCES expenses_commissionrule (country, activity_end)
) ;


-- plus the two filtered indexes. We do need those.

Chapter 2: No List

After the all the tedious effort, let try something simpler. No explanation this time, lets read the code first and explain later:

CREATE TABLE ec_rule (
    id serial NOT NULL PRIMARY KEY,
    country varchar(2) NOT NULL,
    created timestamp with time zone NOT NULL,
    modified timestamp with time zone NOT NULL,
    activity_end timestamp,
    CONSTRAINT country_activity_end_uq
        UNIQUE (country, activity_end)
) ;

CREATE UNIQUE INDEX country_end_ufx
    ON ec_rule (country)
    WHERE (activity_end IS NULL) ;

OK, what happened here? This is much simpler, it can't possibly be used the same way as the previous designs! It can't possibly be equivalent. Or maybe it can?

Lets see what happens here:

  • activity_start is missing completely. How are we supposed to find the periods if they are not stored in the table?

The answer is that activity start is indeed stored, just not in the same row as activity end. That's what the foreign key was all about, to ensure that every end had a matching start. So we can easily find the start of every period using LAG():

CREATE VIEW expenses_commissionrule AS
SELECT 
    id,
    country,
    created,
    modified,
    LAG(activity_end) OVER (PARTITION BY country
                            ORDER BY activity_end)
        AS activity_start,
    activity_end
FROM
    ec_rule ;

Chapter 3: Spot the difference

While all the above design manage to enforce the "no gaps" rule, they all fail to enforce the second rule: "the sum of all intervals should be -INF : +INF."

Can this be amended? It seems to be easier after all once we have achieved no gaps.

Well, yes and no. And no. It isn't easier. It's similar to enforcing that a table has at least one row. That seems easy, too, but it's actually very hard, if not impossible with DDL alone.

For the specific problem though, the "yes and no" means that the rule:

  • can be enforced with design 1 (although it makes it even more complicated).

  • but not with design 2 (at least I can't see a way).

For design 1, we'd need to add a second foreign key (from activity end to the next start) and a unique constraint (country, activity_start). This essentially would convert our list to a doubly linked list. And for the foreign keys to be satisfied, the list would have to either be infinite (which isn't possible) or have ends both left and right, which means two rows with nulls, one for left and one for right (foreign keys are satisfied if one of the columns is NULL).

For design 2, you'd have to ensure - outside of DDL - that for every country there is one row where activity_end is NULL (that row is the rule for the period up to +Infinity).