MySQL – How to Impose LIMIT on UNION

mariadbMySQL

We are working with some legacy code that is using a sub-optimally normalized DB. The problem that we are trying to solve is best described with an illustrative example.
hi_priority and lo_priority are two tables that have the same schema, as defined below:

CREATE TABLE hi_priority
(
  id INT UNSIGNED AUTO_INCREMENT,
  name VARCHAR(16),
  col1 VARCHAR(16),
  col2 INT,

  PRIMARY KEY(id)
);



CREATE TABLE lo_priority
(
  id INT UNSIGNED AUTO_INCREMENT,
  name VARCHAR(16),
  col1 VARCHAR(16),
  col2 INT,

  PRIMARY KEY(id)
);

In each table there is a 1:1 mapping between the name and col2.

The idea is to potentially SELECT name, col1, col2 from both tables but limit the total number of distinct names (let's say 7 for our illustrative example), subject to the following constraints:

  • Preference is to be given to entries from hi_priority table over entries from lo_priority tables.
  • For example, if there are 15 distinct names in hi_priority table,
    we will select all rows with 7 names from hi_priority table,
    ordered by col2.
  • If there are 5 distinct names in hi_priority
    table, we select all rows with those 5 names and on top of that we
    select all rows with 2 distinct names from lo_priority table,
    ordered by col2
  • If there are some entries in both the
    hi_priority table and the lo_priority table, we select entries
    only from the hi_priority table

Let us assume that the tables are populated as follows:

INSERT INTO hi_priority(name, col1, col2) VALUES('john', 'kl7y5tis4yh4', 1), ('john', 'tiid6iywq02k', 1), 
('john', 'detawgsxz615', 1), ('amy', 'i6u4hhc1trjk', 2), ('amy', 'wdpt0t5vtmbt', 2),
('amy', '87z5wgdfztwl', 2), ('amy', 'oj98jjdpb5yv', 2), ('steve', 'hllaazonflf0', 3),
('steve', '0h22y66kq3ow', 3), ('steve', 'o04ksti7di60', 3), ('steve', '3qlylbzqfr51', 3),
('steve', 'wmisshywtb12', 3), ('bob', 'ku8wpf7d6ta9', 3), ('bob', '6t7mn60g0g18', 3),
('bob', 'qv0s7ho3jku6', 3), ('bob', '5qgz7dznzwvn', 3), ('bob', 'mm76nhimm6fu', 3),
('bob', 'k7nlpksc55t2', 3), ('oliver', 'gvf9kjewpj7h', 4), ('oliver', '24w5s30w5te6', 4);


INSERT INTO lo_priority(name, col1, col2) VALUES('doug', 'j205tzrsfmax', 11), ('doug', '0w7rcazbh6es', 11),
('doug', '6xswf8frsjaw', 11), ('john', 'iw7d14vtysz2', 2), ('john', '6lg667dygaz1', 2),
('john', '83uk5dcobpu5', 2), ('john', 'tl8cpzatv0n9', 2), ('mike', '2dsarwozpmci', 3),
('mike', 'hinn6w03wdib', 3), ('mike', '4sxbgyacmjob', 3), ('mike', 'm4q13ln9gctj', 3),
('mike', 'pnip9c8cejo9', 3), ('steve', 'faff9p9v96x4', 4), ('steve', 'd5mxqpd3k8zi', 4),
('martha', 'bxggn5t6d2xn', 8), ('martha', 't05mi47i4n6l', 8), ('martha', 'p30wmw2o6nty', 8),
('martha', 'wip6efajt9yv', 8);

The expected output is:  
+--------+------------+------+
| name   | col1       | col2 |
+--------+------------+------+
| john   | kl7y5tis4y |    1 |
| john   | tiid6iywq0 |    1 |
| john   | detawgsxz6 |    1 |
| amy    | oj98jjdpb5 |    2 |
| amy    | 87z5wgdfzt |    2 |
| amy    | wdpt0t5vtm |    2 |
| amy    | i6u4hhc1tr |    2 |
| bob    | k7nlpksc55 |    3 |
| bob    | mm76nhimm6 |    3 |
| bob    | 5qgz7dznzw |    3 |
| bob    | qv0s7ho3jk |    3 |
| bob    | 6t7mn60g0g |    3 |
| bob    | ku8wpf7d6t |    3 |
| steve  | wmisshywtb |    3 |
| steve  | 3qlylbzqfr |    3 |
| steve  | o04ksti7di |    3 |
| steve  | 0h22y66kq3 |    3 |
| steve  | hllaazonfl |    3 |
| mike   | 4sxbgyacmj |    3 |
| mike   | pnip9c8cej |    3 |
| mike   | m4q13ln9gc |    3 |
| mike   | hinn6w03wd |    3 |
| mike   | 2dsarwozpm |    3 |
| oliver | gvf9kjewpj |    4 |
| oliver | 24w5s30w5t |    4 |
| martha | bxggn5t6d2 |    8 |
| martha | t05mi47i4n |    8 |
| martha | p30wmw2o6n |    8 |
| martha | wip6efajt9 |    8 |
+--------+------------+------+
  • Amy, Bob and Oliver are only present in hi_priority table and will be selected.
  • Mike and Martha are only present in lo_priority table and will be selected.
  • John and Steve are present in both hi_priority table and lo_priority table but will be selected only from hi_priority table

Best Answer

WITH
cte_hi AS ( SELECT DISTINCT name, hi_priority.col2, 1 priority 
            FROM hi_priority
            ORDER BY col2, name LIMIT 7 ),
cte_lo AS ( SELECT DISTINCT lo_priority.name, lo_priority.col2, 2 priority 
            FROM lo_priority
            LEFT JOIN cte_hi USING (name)
            WHERE cte_hi.name IS NULL
            ORDER BY col2, name LIMIT 7 ),
cte_total AS ( SELECT * FROM cte_hi
               UNION ALL
               SELECT * FROM cte_lo
               ORDER BY priority, col2 LIMIT 7 )
SELECT hi_priority.*
FROM cte_total
JOIN hi_priority USING (name)
WHERE priority = 1
UNION ALL
SELECT lo_priority.*
FROM cte_total
JOIN lo_priority USING (name)
WHERE priority = 2
ORDER BY col2, name, id

fiddle