MySQL Select Union; Filter by Shared Relation

innodbMySQLmysql-5.7union

Currently, I have 3 tables (see structure below), speakers, venues, and events. I'm trying to do a relevance search on the names of the speakers and venues. However, on the Search Results page, if the speaker and venue share the same next upcoming event, I'll be showing it on the page along with their opposite (Speaker|Venue) model.

As a result, I end up with duplicate items on my search page. For instance, if they search for 'Adam Wathan', they might see something like:

+------------------------------+  
| **Adam Wathan**              |    
| ? Wathan Theatre             |  
| ? Jan. 9th @ 5:15 PM        |  
+------------------------------+  
+------------------------------+  
| **Wathan Theatre**           | <- Same as above, but opposite primary model.   
| ? Adam Wathan               |  
| ? Jan. 9th @ 5:15 PM        |  
+------------------------------+  
+------------------------------+  
| **Mike Wathan**              | <- Match should be displayed even with no upcoming events.   
| ? See past events.          |  
+------------------------------+

Note I am intentionally including matches that don't have upcoming events too.

Note This query is just giving me a list of models to fetch in subsequent queries. It is not responsible for obtaining all of the information displayed in the illustration above.

In the instance where both the matching speaker and the venue share the same next upcoming event, it should only return the most relevant of the two (or default to speaker if equal).

Yes, I could filter them in the application logic, but that would mess up the pagination as there wouldn't be the expected X entries on the page and the total count would be incorrect if it counts the related duplicates.

Finally, I'd also like for it to return the ID of the next upcoming event (upcoming_event) if one exists, but I keep running into this ONLY_FULL_GROUP_BY error; which I understand the purpose of and would like to keep it enabled, but I'm not used to working around it yet.

Databases are not my strong suit and I've spent 4 days getting this as far as I have. It works well enough and I wouldn't be horribly upset if I can't add these improvements… but I would love to get this working how I had imagined it. Unfortunately, despite hours of research and reading, I'm afraid I've accomplished all I can do without help on this specific issue. I'm also open to alternative methods if it renders the same results, especially if it'll be more performant.

PS: Sorry about the title, really not sure how to describe this question simply. Please edit if you can form a better title after reading the question.

Online Example

DB-Fiddle

Database Structure

# Data structure
CREATE TABLE `speakers`  (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp(0) NULL DEFAULT NULL,
  `updated_at` timestamp(0) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci ROW_FORMAT = Dynamic;

CREATE TABLE `venues`  (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp(0) NULL DEFAULT NULL,
  `updated_at` timestamp(0) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci ROW_FORMAT = Dynamic;

CREATE TABLE `events`  (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `venue_id` int(10) UNSIGNED NOT NULL,
  `speaker_id` int(10) UNSIGNED NOT NULL,
  `starts_at` timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0) ON UPDATE CURRENT_TIMESTAMP(0),
  `ends_at` timestamp(0) NULL DEFAULT NULL,
  `created_at` timestamp(0) NULL DEFAULT NULL,
  `updated_at` timestamp(0) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci ROW_FORMAT = Dynamic;

# Sample Data
INSERT INTO `speakers` VALUES (1, 'Evan You', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (2, 'Freek Van Der Herten', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (3, 'Matt Stauffer', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (4, 'Adam Wathan', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (5, 'Wes Bos', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (6, 'Taylor Otwell', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (7, 'Steve Schoger', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (8, 'Rizqi Djamaluddin', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (9, 'Katerina Trajchevska', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `speakers` VALUES (10, 'Adam Culp', '2019-01-04 13:12:16', '2019-01-04 13:12:16');

INSERT INTO `venues` VALUES (1, 'Otwell Plaza', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `venues` VALUES (2, 'Laravel Park', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `venues` VALUES (3, 'Wathan Theatre', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `venues` VALUES (4, 'Way University', '2019-01-04 13:12:16', '2019-01-04 13:12:16');
INSERT INTO `venues` VALUES (5, 'Fidao Group', '2019-01-04 13:12:16', '2019-01-04 13:12:16');

INSERT INTO `events` VALUES (1, 1, 2, '2019-02-25 12:15:00', '2019-02-25 18:00:00', '2019-01-04 13:12:22', '2019-01-04 13:12:22');
INSERT INTO `events` VALUES (2, 4, 3, '2019-01-09 05:15:00', NULL, '2019-01-04 13:12:22', '2019-01-04 13:12:22');
INSERT INTO `events` VALUES (3, 4, 5, '2018-01-09 05:15:00', NULL, '2019-01-04 13:12:22', '2019-01-04 13:12:22');
INSERT INTO `events` VALUES (4, 4, 5, '2019-02-09 05:15:00', NULL, '2019-01-04 13:12:22', '2019-01-04 13:12:22');

Database Query

(
    SELECT
        `venues`.`id`,
        `venues`.`name`,
        'venues' AS type,
        ( CASE WHEN `name` LIKE "adam wathan Stau" THEN 500 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "adam" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "wathan" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "Stau" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN min( `events`.`starts_at` >= '2019-01-5 12:00:00' ) THEN 50 ELSE 0 END ) AS `relevance` 
    FROM `venues`
        LEFT JOIN `events` ON `events`.`venue_id` = `venues`.`id` 
    GROUP BY `venues`.`id` 
    HAVING `relevance` > 50 
) UNION (
    SELECT
        `speakers`.`id`,
        `speakers`.`name`,
        'speakers' AS type,
        ( CASE WHEN `name` LIKE "adam wathan Stau" THEN 500 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "adam" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "wathan" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN `name` LIKE "Stau" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau%" THEN 40 ELSE 0 END ) +

        ( CASE WHEN min( `events`.`starts_at` >= '2019-01-5 12:00:00' ) THEN 50 ELSE 0 END ) AS `relevance` 
    FROM `speakers`
        LEFT JOIN `events` ON `events`.`speaker_id` = `speakers`.`id` 
    GROUP BY `speakers`.`id` 
    HAVING `relevance` > 50 
) 
ORDER BY
    `relevance` DESC,
    `name` DESC 
LIMIT 20 OFFSET 0

Best Answer

For MySQL 5.7 - as tagged…:

Still telling from

Results 1 (4, Adam Wathan, speakers) and 2 (3, Wathan Theatre, venues) are both from the Event with an ID of 2

the INSERT for event #2 should probably read

INSERT INTO `events` VALUES (2, 3, 4, '2019-01-09 05:15:00', NULL, '2019-01-04 13:12:22', '2019-01-04 13:12:22');

From here, let's compose the final result step by step - using a sequence of views and a final select:


CREATE VIEW `DataCollector` AS SELECT * FROM (
   (
    SELECT
        `venues`.`id`,
        `events`.`id` AS `event_id`,
        `venues`.`name`,
        'venues' AS type,
        ( CASE WHEN `name` LIKE "adam wathan Stau" THEN 500 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN min( `events`.`starts_at` >= '2019-01-5 12:00:00' ) THEN 50 ELSE 0 END ) AS `relevance` 
    FROM `venues`
        LEFT JOIN `events` ON `events`.`venue_id` = `venues`.`id` 
    GROUP BY `venues`.`id`, `events`.`id` 
    HAVING `relevance` > 50 
   )
   UNION ALL
   (
    SELECT
        `speakers`.`id`,
        `events`.`id` AS `event_id`,
        `speakers`.`name`,
        'speakers' AS type,
        ( CASE WHEN `name` LIKE "adam wathan Stau" THEN 500 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "adam%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%adam%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "wathan%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%wathan%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau" THEN 100 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "Stau%" THEN 80 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau" THEN 50 ELSE 0 END ) +
        ( CASE WHEN `name` LIKE "%Stau%" THEN 40 ELSE 0 END ) +
        ( CASE WHEN min( `events`.`starts_at` >= '2019-01-5 12:00:00' ) THEN 50 ELSE 0 END ) AS `relevance` 
    FROM `speakers`
        LEFT JOIN `events` ON `events`.`speaker_id` = `speakers`.`id` 
    GROUP BY `speakers`.`id`, `events`.`id` 
    HAVING `relevance` > 50 
   )
) `_DataCollector`;

CREATE VIEW `DuplicateFilter` AS (
  SELECT
    `event_id`
    , MAX(`relevance`) AS `max_relevance`
    , MIN(`relevance`) AS `min_relevance`
  FROM `DataCollector`
  WHERE `event_id` IS NOT NULL
  GROUP BY `event_id`
);

SELECT
  `DataCollector`.`id`
  , `DataCollector`.`event_id`
  , `DataCollector`.`name`
  , `DataCollector`.`type`
  , `DataCollector`.`relevance`
FROM `DataCollector`
JOIN `DuplicateFilter`
  ON `DataCollector`.`event_id` = `DuplicateFilter`.`event_id`
  AND `DataCollector`.`relevance` = `DuplicateFilter`.`max_relevance`
  AND `DataCollector`.`relevance` = `DuplicateFilter`.`min_relevance`
  AND `DataCollector`.`type` = 'speakers'
UNION ALL
(SELECT
  `DataCollector`.`id`
  , `DataCollector`.`event_id`
  , `DataCollector`.`name`
  , `DataCollector`.`type`
  , `DataCollector`.`relevance`
FROM `DataCollector`
JOIN `DuplicateFilter`
  ON `DataCollector`.`event_id` = `DuplicateFilter`.`event_id`
  AND `DataCollector`.`relevance` = `DuplicateFilter`.`max_relevance`
  AND `DataCollector`.`relevance` > `DuplicateFilter`.`min_relevance`
)
UNION ALL
(SELECT
  `id`
  , `event_id`
  , `name`
  , `type`
  , `relevance`
FROM `DataCollector`
WHERE `event_id` IS NULL)
ORDER BY `relevance` DESC, `name` DESC
LIMIT 20 OFFSET 0
;

N.B.

  • As performance was mentioned, all UNION have been turned into UNION ALL. This should help - but does not consistently do so in DB Fiddle…
  • Much more is probably to be gained in that respect by streamlining the relevance calculation - if at all feasible.

See it in action: DB Fiddle. (With a second SELECT to check the impact of UNION ALL right away…)

Please comment, if and as this requires adjustment / further detail.