Mariadb – Random selection with a “complex” subset structure

database-designmariadbmariadb-10.3performancequeryquery-performance

This is my first question in the whole StackExchange world.
I'm not sure if this kind of question fit perfectly here or in DBA Meta section so feel free to "flag" or move this question.

In my application I have to make a sequence of random selection in my data.
The structure is something like this (see db<>fiddle):

  • 4 task (e.g. Eat, Sleep, …)
  • 10 mood (e.g. Calm, Excited, Happy, …)
  • 10 place (e.g. Bedroom, Batchroom, …)

only a few combinations between these items make sense. I decided to use a link table to summarise all suitable combinations using foreign keys from the other tables (see db<>fiddle above). I want that only the combinations present in the link table can be chosen.

The steps to perform are the following:

  1. Pick 1 task randomly (e.g. Eat)
  2. Retrieve all combinations that contain that task (e.g. all combination about Eat)
  3. Pick 1 mood randomly
  4. Retrieve all combinations from step 2 list that contain step 3 mood
  5. Pick 1 place randomly
  6. Find the randomly selected combination

REMARKS
Every random pick (for each level) should be performed from a list that contains "unique" items. For example if a list with more than one combination with the same mood si accepted obviously the most numerous mood are more likely to be picked up.

All this procedure is my idea to avoid that "numerosity" of valid combination for different task can influence the probability of being selected. For example, if the selection is made simply choosing one row in link table and I got 20 combination for Eat task and only 5 for Sleep task it is more probable that an Eat task will be selected.


NOTE: I'm using MariaDB 10.3 (x64)

Real question
I'm a complete beginner with DBs (as you surely noticed).
Does it make sense to perform these kind of operations "Database-side"?
I think it would be much more easier to do something like this using the application that in the end uses these data from my DB. On the contrary I started considering the idea of performing this selection entirely with database with performance purpose.

About random selection of a row I've read: Data Sampling: Techniques for Efficiently Finding a Random Row that is a rewrite of Rick James' blog Fetching Random Rows from a Table
and several question here on DBA.

I'm trying to write my query to perform operations in a smart way. The code is unfinished, maybe 30-40% of the final mission. Having no experience with MySQL I'm discovering new features everyday so I restarted from scratch several time. At this stage I prefer that focus goes on the question above. Obviously some advice regarding best strategy for writing this kind of queries are welcome.

create sql security invoker view filtered_table as
select *
from link_table
join ( 
    select task_id as id
    from task
    order by rand()
    limit 1 
) as random
on link_table.task_id = random.id;
select * from filtered_table;

set @min := (
    select min(subset_id)
    from filtered_table
);
set @max := (
    select max(subset_id)
    from filtered_table
);
select @min, @max;

PS. sorry for my English, I usually eat Pasta and Pizza so… ¯_(ツ)_/¯

Best Answer

The steps to perform are the following:

  • Pick 1 task randomly (e.g. Eat)
  • Retrieve all combinations that contain that task (e.g. all combination about Eat)
  • Pick 1 mood randomly from the list of combination retrieved in step 2
  • Retrieve all combinations from step 2 list that contain step 3 mood
  • Pick 1 place randomly from remained combination
  • Find the randomly selected combination
WITH 
cte1 AS (SELECT * 
         FROM task 
         ORDER BY RAND() LIMIT 1),
cte2 AS (SELECT mood.* 
         FROM link_table 
         JOIN mood USING (mood_id)
         JOIN cte1 USING (task_id)
         ORDER BY RAND() LIMIT 1)
SELECT cte1.*, cte2.*, place.*
FROM link_table
JOIN cte1 USING (task_id)
JOIN cte2 USING (mood_id)
JOIN place USING (place_id)
ORDER BY RAND() LIMIT 1

fiddle

PS. It is possible that the query will return no record.

PPS. There is more simple alternative:

WITH cte AS (SELECT * 
             FROM link_table
             ORDER BY RAND() LIMIT 1)
SELECT task.*, mood.*, place.*
FROM cte
JOIN task USING (task_id)
JOIN mood USING (mood_id)
JOIN place USING (place_id)

fiddle

This variant will never return zero records if link_table is not empty.

But! the chance for each separate record to be returned differs for 1st and 2nd queries...