Postgresql – Ignore previous rows in the UNION based on key

postgresqlunion

I have a table with task delivery times, which is based on the task department (only), task category (only), or product task and category. These rules can be applied directly to a customer, a group of customers or to all customers.

Example:

  • Customer 01 + Departament: Design + Category: Create Logo = 24 hours
  • Customer 01 + Departament: Design + Any Category = 48 hours All
  • Customers + Departament: Design + Any Category = 72 hours All
  • Customers + Any Category + Category: Create Logo = 48 hours

This table has the structure below:

|rule_id|customer_id|departament_id|category_id|deadline|

SQL to create tables:

CREATE TABLE public.test_customer (
    id INTEGER NOT NULL,
    name varchar(50) NOT NULL
);
CREATE TABLE public.test_departament (
    id INTEGER NOT NULL,
    name varchar(50) NOT NULL
);
CREATE TABLE public.test_category (
    id INTEGER NOT NULL,
    name varchar(50) NOT NULL
);
CREATE TABLE public.test_task (
    id INTEGER NOT NULL,
    customer_id INTEGER,
    departament_id INTEGER,
    category_id INTEGER,
    description varchar(50) NOT NULL
);
CREATE TABLE public.test_time_rule (
    id INTEGER NOT NULL,
    customer_id INTEGER,
    departament_id INTEGER,
    category_id INTEGER,
    time_deadline INTEGER
);

INSERT INTO public.test_customer (id, name) VALUES(1, 'Customer with Rule');
INSERT INTO public.test_customer (id, name) VALUES(2, 'Customer without Rule');

INSERT INTO public.test_departament (id, name) VALUES(1, 'Front-End');
INSERT INTO public.test_departament (id, name) VALUES(2, 'Back-End');
INSERT INTO public.test_departament (id, name) VALUES(3, 'DBA');

INSERT INTO public.test_departament (id, name) VALUES(1, 'Design');
INSERT INTO public.test_departament (id, name) VALUES(2, 'Create HTML Pages');
INSERT INTO public.test_departament (id, name) VALUES(3, 'Create Tables');
INSERT INTO public.test_departament (id, name) VALUES(4, 'Fix Bugs');

INSERT INTO public.test_task (id, customer_id, departament_id, category_id, description) VALUES(1, 1, 1, 1, 'Create New Form Design');
INSERT INTO public.test_task (id, customer_id, departament_id, category_id, description) VALUES(2, 1, 2, 4, 'Fix Bug on Customer Table');
INSERT INTO public.test_task (id, customer_id, departament_id, category_id, description) VALUES(2, 1, 3, 3, 'Create City Table');
INSERT INTO public.test_task (id, customer_id, departament_id, category_id, description) VALUES(3, 2, 1, 1, 'Create New Form Design');
INSERT INTO public.test_task (id, customer_id, departament_id, category_id, description) VALUES(4, 2, 3, 3, 'Create City Table');

INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(1, NULL, NULL, NULL, 20); --All Customers
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(2, NULL, 1, NULL, 30); --All Customers + Departament
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(3, NULL, NULL, 1, 40); --All Customers + Category
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(4, NULL, 3, 3, 50); --All Customers + Departament + Category

INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(5, 1, NULL, NULL, 20); --Customers With Rule
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(6, 1, 1, NULL, 30); --Customers + Departament
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(7, 1, NULL, 1, 40); --Customers + Category
INSERT INTO public.test_time_rule (id, customer_id, departament_id, category_id, time_deadline) VALUES(8, 1, 3, 3, 50); --Customers + Departament + Category

I am trying to create a SELECT that lists all the tasks and show which rule applies to it.
For this I am using application levels of the rule, which are:

1 – Rule with customer + department + category (defined customer ID)

2 – Rule with customer group + department + category (defined customer group ID)

3 – General rule (all customer) + department + category (without defined customer and customer group ID)

4 – Rule with customer + department

I have summarized the rules, but they reach up to 16 rules, and i created a view for each rule with the tasks that match its parameters.

Based on these rules I created a SELECT with several UNION ALL (performance reasons), but as the row can be repeated due to a previous select, I ended up adding NOT EXISTS conditions to prevent this problem.

Example:

 SELECT
    1 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NOT NULL
    AND ttr.departament_id IS NOT NULL
    AND ttr.category_id IS NOT NULL
    AND tt.customer_id = ttr.customer_id
    AND tt.departament_id = ttr.departament_id
    AND tt.category_id = ttr.category_id
UNION
SELECT
    2 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NULL
    AND ttr.departament_id IS NOT NULL
    AND ttr.category_id IS NOT NULL
    AND tt.departament_id = ttr.departament_id
    AND tt.category_id = ttr.category_id
UNION
SELECT
    3 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NOT NULL
    AND ttr.departament_id IS NOT NULL
    AND ttr.category_id IS NULL
    AND tt.customer_id = ttr.customer_id
    AND tt.departament_id = ttr.departament_id
UNION
SELECT
    4 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NULL
    AND ttr.departament_id IS NOT NULL
    AND ttr.category_id IS NULL
    AND tt.departament_id = ttr.departament_id
UNION
SELECT
    5 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NOT NULL
    AND ttr.departament_id IS NULL
    AND ttr.category_id IS NOT NULL
    AND tt.customer_id = ttr.customer_id
    AND tt.category_id = ttr.category_id
UNION
SELECT
    6 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NULL
    AND ttr.departament_id IS NULL
    AND ttr.category_id IS NOT NULL
    AND tt.category_id = ttr.category_id
UNION
SELECT
    7 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NOT NULL
    AND ttr.departament_id IS NULL
    AND ttr.category_id IS NULL
    AND tt.customer_id = ttr.customer_id
UNION
SELECT
    8 as level,
    ttr.id as rule_id,
    ttr.customer_id as rule_customer_id,
    tt.customer_id as task_customer_id,
    ttr.departament_id as rule_departament_id,
    tt.departament_id as task_departament_id,
    ttr.category_id as rule_category_id,
    tt.category_id as task_category_id,
    tt.id as task_id,
    tt.description as task_description,
    ttr.time_deadline
FROM
    test_time_rule ttr,
    test_task tt
WHERE
    ttr.customer_id IS NULL
    AND ttr.departament_id IS NULL
    AND ttr.category_id IS NULL
ORDER BY
    level;

As you may have noticed at each level that it enters, all the previous views are added, which generates a large and slow SQL.

My common sense tells me that this is not the best way to make this consultation.

I considered something that ignores a row based on the task id already displayed in a previous union.
Is there a better way to create this SELECT?

Note: I cannot add the rule ID used in the task, as this rule can be changed, deleted or even added in the future to a rule that best matches the task.

Best Answer

If you want task_id to be unique, how about something like:

SELECT DISTINCT ON (task_id)
       *
FROM (SELECT /* first query */
      UNION ALL
      SELECT /* second query */
      ...) AS q
ORDER BY task_id, /* other criteria */

That will return the first row for each task_id in the specified ordering.