We are working with some legacy code that is using a sub-optimally normalized DB. The problem that we are trying to solve is best described with an illustrative example.
hi_priority
and lo_priority
are two tables that have the same schema, as defined below:
CREATE TABLE hi_priority
(
id INT UNSIGNED AUTO_INCREMENT,
name VARCHAR(16),
col1 VARCHAR(16),
col2 INT,
PRIMARY KEY(id)
);
CREATE TABLE lo_priority
(
id INT UNSIGNED AUTO_INCREMENT,
name VARCHAR(16),
col1 VARCHAR(16),
col2 INT,
PRIMARY KEY(id)
);
In each table there is a 1:1 mapping between the name
and col2
.
The idea is to potentially SELECT name, col1, col2
from both tables but limit the total number of distinct names (let's say 7 for our illustrative example), subject to the following constraints:
- Preference is to be given to entries from
hi_priority
table over entries fromlo_priority
tables. - For example, if there are 15 distinct names in
hi_priority
table,
we will select all rows with 7 names fromhi_priority
table,
ordered bycol2
. - If there are 5 distinct names in
hi_priority
table, we select all rows with those 5 names and on top of that we
select all rows with 2 distinct names fromlo_priority
table,
ordered bycol2
- If there are some entries in both the
hi_priority
table and thelo_priority
table, we select entries
only from thehi_priority
table
Let us assume that the tables are populated as follows:
INSERT INTO hi_priority(name, col1, col2) VALUES('john', 'kl7y5tis4yh4', 1), ('john', 'tiid6iywq02k', 1),
('john', 'detawgsxz615', 1), ('amy', 'i6u4hhc1trjk', 2), ('amy', 'wdpt0t5vtmbt', 2),
('amy', '87z5wgdfztwl', 2), ('amy', 'oj98jjdpb5yv', 2), ('steve', 'hllaazonflf0', 3),
('steve', '0h22y66kq3ow', 3), ('steve', 'o04ksti7di60', 3), ('steve', '3qlylbzqfr51', 3),
('steve', 'wmisshywtb12', 3), ('bob', 'ku8wpf7d6ta9', 3), ('bob', '6t7mn60g0g18', 3),
('bob', 'qv0s7ho3jku6', 3), ('bob', '5qgz7dznzwvn', 3), ('bob', 'mm76nhimm6fu', 3),
('bob', 'k7nlpksc55t2', 3), ('oliver', 'gvf9kjewpj7h', 4), ('oliver', '24w5s30w5te6', 4);
INSERT INTO lo_priority(name, col1, col2) VALUES('doug', 'j205tzrsfmax', 11), ('doug', '0w7rcazbh6es', 11),
('doug', '6xswf8frsjaw', 11), ('john', 'iw7d14vtysz2', 2), ('john', '6lg667dygaz1', 2),
('john', '83uk5dcobpu5', 2), ('john', 'tl8cpzatv0n9', 2), ('mike', '2dsarwozpmci', 3),
('mike', 'hinn6w03wdib', 3), ('mike', '4sxbgyacmjob', 3), ('mike', 'm4q13ln9gctj', 3),
('mike', 'pnip9c8cejo9', 3), ('steve', 'faff9p9v96x4', 4), ('steve', 'd5mxqpd3k8zi', 4),
('martha', 'bxggn5t6d2xn', 8), ('martha', 't05mi47i4n6l', 8), ('martha', 'p30wmw2o6nty', 8),
('martha', 'wip6efajt9yv', 8);
The expected output is:
+--------+------------+------+
| name | col1 | col2 |
+--------+------------+------+
| john | kl7y5tis4y | 1 |
| john | tiid6iywq0 | 1 |
| john | detawgsxz6 | 1 |
| amy | oj98jjdpb5 | 2 |
| amy | 87z5wgdfzt | 2 |
| amy | wdpt0t5vtm | 2 |
| amy | i6u4hhc1tr | 2 |
| bob | k7nlpksc55 | 3 |
| bob | mm76nhimm6 | 3 |
| bob | 5qgz7dznzw | 3 |
| bob | qv0s7ho3jk | 3 |
| bob | 6t7mn60g0g | 3 |
| bob | ku8wpf7d6t | 3 |
| steve | wmisshywtb | 3 |
| steve | 3qlylbzqfr | 3 |
| steve | o04ksti7di | 3 |
| steve | 0h22y66kq3 | 3 |
| steve | hllaazonfl | 3 |
| mike | 4sxbgyacmj | 3 |
| mike | pnip9c8cej | 3 |
| mike | m4q13ln9gc | 3 |
| mike | hinn6w03wd | 3 |
| mike | 2dsarwozpm | 3 |
| oliver | gvf9kjewpj | 4 |
| oliver | 24w5s30w5t | 4 |
| martha | bxggn5t6d2 | 8 |
| martha | t05mi47i4n | 8 |
| martha | p30wmw2o6n | 8 |
| martha | wip6efajt9 | 8 |
+--------+------------+------+
- Amy, Bob and Oliver are only present in
hi_priority
table and will be selected. - Mike and Martha are only present in
lo_priority
table and will be selected. - John and Steve are present in both
hi_priority
table andlo_priority
table but will be selected only fromhi_priority
table
Best Answer
fiddle