Enforce uniqueness across multiple rows

oracle

I'm trying to enforce groups of activities do not have duplicates without using triggers or materialized views if possible.

DDL to create table:

CREATE TABLE groups (group_id NUMBER, activity_id number) ; 
INSERT INTO groups (group_id, activity_id) values (1,1) ;
INSERT INTO groups (group_id, activity_id) values (1,2) ; 
INSERT INTO groups (group_id, activity_id) values (2,2) ; 
INSERT INTO groups (group_id, activity_id) values (2,3) ; 

The next query should return no duplicates if all groups are made of unique activities. I'd like to enforce that this query never returns any duplicates:

SELECT LISTAGG(ACTIVITY_ID, '|') WITHIN GROUP(ORDER BY ACTIVITY_ID) FROM GROUPS GROUP BY GROUP_ID ; 

My current solution is to insert the result of the LISTAGG query into a table with a unique constraint and use that to control duplicates with my application code handling any constraint violations by searching for the existing row (and thereby getting the group ID to reuse the existing one)

Best Answer

TL;DR

What you want has been defined by ANSI SQL-92 as an ASSERTION. None of the major RDBMS support ASSERTION ... yet.

Expanding what Paul(tm) mentioned in the comments - you must perform your DML for each "GROUP + List of ACTIVITIES" within a single Action (DML or Procedure call).

Assuming you need multi-user capability, you can't do what you want with a single table. You will need to use multiple tables. You can then hide the logic (including locking) for the processing against the multiple tables by using VIEWs and Transactional APIs (Packages)

DO NOT ALLOW USERS/APPLICATIONS TO PERFROM DML AGAINST THESE UNDERLYING TABLES!!

Example Underlying Tables

The APIs acquire a Row Lock against the group_activity#lock table. This allow multiple users to manipulate different GROUP_IDs simultaneously. Yes, group_activity#lock could hold a Nested Table of Activities. But, from experience, using a UDT in a table gets messy when you try to change the UDT.

The activities_hash column could be the LISTAGG() result or even the SHA256 version of the result. The function GROUP_ACTIVITY_API.HASH() controls the calculation of this value. The APIs enforce proper locking and calculating of the Hash value. As such, you do not let user/application to perform DML operations against this table directly.

The table group_activity#activities holds the list of activities that belong to each GROUP_ID. Because of the required locking against group_activity#lock, do not allow direct DML activity against this table.

create table group_activity#lock (
  "GROUP_ID"  int --GROUP and GROUP_ID are being highlighted in SQL*Dev
  ,activities_hash  varchar2(4000)
  ,constraint ga#lock_pk primary key ("GROUP_ID")
  ,constraint ga#lock_fk1 foreign key ("GROUP_ID") references "GROUP_ID" ("GROUP_ID")
  ,constraint ga#lock_uq1 unique (activities_hash)
);
comment on table group_activity#lock is 'This table is used for locking and enforcing UNIQUE constraint';
comment on column group_activity#lock.activities_hash is 'Calculated Hash key for all ACTIVITY_IDs based on GROUP_ACTIVITY_API.HASH(). That function generates a Unique hash for each Unique set of ACTIVITY_IDs.';

create table group_activity#activities (
  "GROUP_ID" int
  ,activity_id int
  ,constraint tg#activities_pk primary key ("GROUP_ID",activity_id)
  ,constraint tg#activities_fk1 foreign key ("GROUP_ID") references group_activity#lock ("GROUP_ID")
  ,constraint tg#activities_fk2 foreign key (activity_id) references activities (activity_id)
);

comment on table group_activity#activities is 'This table contains the individual elements of the Nest Table.';

API Set 1 - Use 2 VIEWs

With this set of interfaces, the Users/Applications get access to two VIEWS.

One view looks exactly like what you have already. You will use this view for linking against other tables. Users/Applications will have only the SELECT Privilage against this VIEW. THEY DO NOT GET ACCESS TO THE UNDERLYING TABLES!

The other view contains a GROUP_ID and a Nested Table. This VIEW is used to perform your DML activities. The actual processing is handled via INSTEAD OF trigger (not listed here). A variation of this VIEW could use a VARCHAR2 (or XMLType) that shows the LISTAGG() result.

Adding constraints to the VIEWs, in RELY DISABLE NOVALIDATE state, might help the CBO create better plans. Your Millage May Vary.

CREATE or replace TYPE activity_element_t AS OBJECT ( activity_id int );
/
CREATE or replace TYPE activity_list_nt AS TABLE OF activity_element_t;
/

CREATE or replace
VIEW group_activities ( "GROUP_ID", ACTIVITY_id ) as
select a."GROUP_ID", b.activity_id
from group_activity#lock a
  join group_activity#activities b on a."GROUP_ID" = b."GROUP_ID";

comment on table group_activities is 'use this VIEW to perform LINKs.';

CREATE or replace
VIEW group_activities_dml ("GROUP_ID", ACTIVITY_LIST ) as
select a."GROUP_ID", cast(multiset( select b.activity_id
                                    from group_activity#activities b
                                    where a."GROUP_ID"=b."GROUP_ID")
                          as activity_list_nt ) activity_list
from group_activity#lock a;
comment on table group_activities_dml is 'use this VIEW to perform DML using a Nested Table element.  GROUP_ID cannot be modified via this method.';

API Set 2 - use a GTT

Not every application framework can handle a Nested Table. Plus - I'm sure a programmer will eventually FUBAR the construction of the VARCHAR2.

Instead, of performing a DML against a VIEW with a Nested Table/VARCHAR2, this method allows a framework to:

  1. Perform a DML against a GLOBAL TEMPORARY TABLE.
  2. Call a Procedure to process the data inside the GTT.

I would recommend this method for things like APEX's IG row processing.

You should still create at least the read only VIEW so that it can be used by applications to link to other tables.