PostgreSQL Uniqueness Constraint – Uniqueness Constraint with Date Range

database-designexclusion-constraintpostgresqlpostgresql-9.4

Consider a prices table with these columns:

id         integer primary key
product_id integer -- foreign key
start_date date not null
end_date   date not null
quantity   integer
price      numeric

I'd like the database to enforce the rule that a product can only have one price at a specific quantity in a date range (via where <date> BETWEEN start_date AND end_date).

Is this kind of range-based constraint doable?

Best Answer

Yes, you can use an EXCLUDE constraint, which is a generalization of UNIQUE constraints:

ALTER TABLE prices 
  ADD CONSTRAINT unique_price_per_product_quantity_daterange
    EXCLUDE  USING gist
    ( product_id WITH =, 
      quantity WITH =, 
      daterange(start_date, end_date, '[]') WITH &&   -- this is the crucial
    );

The constraint can be interpreted as saying:

Don't allow two rows that have same product_id, same quantity and overlapping (&&) date ranges.

The '[]' is for the wanted all-inclusive date range (the default is [) for range types).

See the documentation on constraints on range types. You probably also need to add the extension by running (once, for each database where you want this installed):

CREATE EXTENSION btree_gist;

Related Solutions

Ms-access – Database Design Regarding a Relationship for Patient Visit

The first thing that catches my eye is what appear to be separate tables for each exam. This could get rather difficult to maintain in the future: if the structure of exams changes you need to update n tables. Also, any aggregate queries that cover all exams for 1 patient will need to join n tables.

I'd suggest a structure like this:

patient_exams
-------------
  patient_exam_id (PK)
  visit_id (FK)
  patient_id (FK)
  study_id (Fk)
  exam_seq_num 
  (other fields)

Use exam_seq_num to track which exam number the record is for a patient. You can use a composite key containing patient_id, visit_id, study_id, exam_seq_num to ensure that you don't get exams for a patient with duplicate sequence numbers. You'll still need a bit of code to create the correct sequence number, maybe an on-insert trigger.

UPDATE:

Ok, so now it's clear that the exame tables are actually for different types of exams. You could have something like

base_exam
---------
  id (PK)
  patient_id (FK)
  exam_date
  (other stuff)

Lumbar_exam_details
-------------------
  lumbar_exam_id (PK)
  base_exam_id (FK to base_exam.id)
  (other specific fields)

blood_exam_details
------------------
  blood_exam_id (pk)
  base_exam_id (FK to base_exam.id)
  (other specific fields)

All of your exam detail tables reference the base_exam table, which stores common fields for all exams (such as the date of the exam, the patient who was examined, etc...).

If you really want to have a "display name" for exam types, I would do that in a view that overlays the specific exam table. For example, the query for lumbar_exam_view might look like:

SELECT *, "Lumbar Exam" AS DISPLAY_NAME
FROM LUMBAR_EXAM_DETAILS

Use this view in any queries/reports on lumbar_exam_details you will have access to display_name anywhere that you want the user-friendly string.

If you need the display name to be stored as actual data, you can add an exam_type_id field to base_exam and then have it point to an exam_type table:

exam_type
---------
  id
  display_name

Data:

exam_types
ID  | display_name
------------------
1   | Lumbar Exam
2   | Blood test

Now your base exam records have an ID that points them to the correct user-friendly string. Note that this does not ensure that the exam detail record is of the correct type (i.e. it is possible, for example, to have a base_exam record that is referenced by lumbar_exam_details, but the base_exame record erroneously references the display name "Blood Test") - it only works on the display name.

PostgreSQL – How to Delete Duplicate Records Efficiently

Core feature is the window function lag().
Also pay special attention to avoid deadlocks and race conditions with concurrent deletes and inserts (which can affect which rows to delete!):

CREATE OR REPLACE FUNCTION remove_vendor_price_dupes(_vendor int)
  RETURNS integer AS
$func$
DECLARE
   del_ct int;
BEGIN
   -- this may or may not be necessary:
   -- lock rows to avoid race conditions with concurrent deletes
   PERFORM 1
   FROM   vendor_prices
   WHERE  vendor = _vendor
   ORDER  BY sku, effective_date, id  -- guarantee row locks in consistent order
   FOR    UPDATE;

   -- delete redundant prices
   DELETE FROM vendor_prices v
   USING (
      SELECT id
           , price = lag(price) OVER w  -- same as last row
             AND (lead(id) OVER w) IS NOT NULL AS del  -- not last row
      FROM   vendor_prices
      WHERE  vendor = _vendor
      WINDOW w AS (PARTITION BY sku ORDER BY effective_date, id)
      ) d
   WHERE v.id = d.id
   AND   d.del;

   GET DIAGNOSTICS del_ct = ROW_COUNT;  -- optional:
   RETURN del_ct;  -- return number of deleted rows
END
$func$  LANGUAGE plpgsql;

Call:

SELECT remove_vendor_price_dupes(1);

Notes

The current version of the 9.3 major release is 9.3.6. The project recommends that ...

all users run the latest available minor release for whatever major version is in use.
A multicolumn index on (vendor, sku, effective_date, id) would be perfect for this - in this particular order. But Postgres can combine indexes rather efficiently, too.
It might pay to add the otherwise irrelevant price as last item ot the index to get index-only scans out of this. You'll have to test.
Since you have concurrent deletes it may be a good idea to run a separate delete per vendor to reduce the potential for race conditions and deadlocks. Since there are only a few vendors, this seems like a reasonable partitioning. (Many tiny calls would be comparatively slow.)
I am running a separate SELECT (PERFORM in plpgsql, since we do not use the result) because the row locking clause FOR UPDATE cannot be used together with window functions. Don't let the keyword mislead you, this is not just for updates. I am locking all rows for the given vendor, since the result depends on all rows. Concurrent reads are not impaired, only concurrent writes have to wait until we are done. That's another reason why deleting rows for one vendor at a time in a separate transaction should be best.
sku is unique per product, so we can PARTITION BY it.
ORDER BY effective_date, id: your first version of the question included code for duplicate rows, so I added id to ORDER BY as additional tie breaker. This way it works for duplicates on (sku, effective_date) as well.
To preserve the last row for each set: AND (lead(id) OVER w) IS NOT NULL. Reusing the same window for lead() is cheap - independent of the added explicit WINDOW clause - that's just syntax shorthand for convenience.
I am locking rows in the same order: ORDER BY sku, effective_date, id. Make sure that concurrent DELETEs operate in the same order to avoid deadlocks. If all other transactions delete no more than a single row within the same transaction, there cannot be deadlocks and you don't need the row locking at all.
If concurrent INSERTs could lead to a different result (make different rows obsolete), you have to lock the whole table in EXCLUSIVE mode instead to avoid race conditions:
```
LOCK TABLE vendor_prices IN EXCLUSIVE MODE;
```
Do that only if it's necessary. It blocks all concurrent write access.
I am returning the number of rows deleted, but that's totally optional. You might as well return nothing and declare the function as RETURNS void.

Best Answer

Related Solutions

Ms-access – Database Design Regarding a Relationship for Patient Visit

PostgreSQL – How to Delete Duplicate Records Efficiently

Notes

Related Question