Mysql – Trying to maximize the performance of a query MySQL 5.7

MySQLmysql-5.7

I am trying to handle a scenario that matches other users by their interest and gender.

There will be 2 genders and also 5 interests for each user and each of them will be either 0 or 1. and a rate, the sum (gender+interest) will be calculated and sorted together with last update timestamp in descending order depending on what the user select. (e.g. A user select Female and Movie, ORDER BY SUM(male + movie) DESC, timestamp DESC)

User may select

  1. select interest but not gender
  2. select gender but not interest
  3. both gender and interest
  4. neither gender and interest

Desired outcome:

  1. user A select male and movie
  2. user B select female, no interest (with the most recent timestamp)
  3. user C select female and movie
  4. user D select no gender, and select movie interest
  5. user E select neither

When user A performs the search, the ordered result should be

  1. C with 2 Points
  2. B with 1 Point
  3. D with 1 Point
  4. E with 0 Point

I try to use a table structure like below and use 150,000 rows of data to test it.

CREATE TABLE `candidate_profiles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `male` decimal(3,2) NOT NULL,
  `female` decimal(3,2) NOT NULL,
  `interestA` decimal(3,2) NOT NULL,
  `interestB` decimal(3,2) NOT NULL,
  `interestC` decimal(3,2) NOT NULL,
  `interestD` decimal(3,2) NOT NULL,
  `interestE` decimal(3,2) NOT NULL,
  `is_active` tinyint(1) DEFAULT '1',
  `timestamp` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Below is the query that I used

SELECT 
    c.id,
    p.id AS profile_id,
    c.message,
    (gender + interestB) AS point,
    c.timestamp,
    CASE
        WHEN male = 1 THEN 'male'
        WHEN female = 1 THEN 'female'
        ELSE 'disabled'
    END AS 'gender',
    CASE
        WHEN interestA = 1 THEN 'interestA'
        WHEN interestB = 1 THEN 'interestB'
        WHEN interestC= 1 THEN 'interestC'
        WHEN interestD = 1 THEN 'interestD'
        WHEN interestE = 1 THEN 'interestE'
        ELSE 'disabled'
    END AS 'interest'
FROM
    candidate c
        INNER JOIN
    profiles p ON p.candidate_id = c.id
        LEFT JOIN
    ignored ci ON ci.candidate_id = 1
        AND ci.target_id = c.id
WHERE
    c.state = 0 
    AND ci.id IS NULL
    AND c.id NOT IN (151011 , 151009,
        151008,
        151007,
        151006,
        151005,
        151004,
        151003)
GROUP BY c.ID
ORDER BY c.timestamp DESC , point DESC
LIMIT 8

and this is the EXPLAIN result

+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys                                                                                                                    | key                                       | key_len | ref             | rows  | filtered | Extra                                                  |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | ref    | PRIMARY,idx_distributor_candidate_timestamp,idx_distributor_candidate_state_timestamp,v110_id_timestamp_state,id_timestamp_state | idx_distributor_candidate_state_timestamp | 4       | const           | 75551 |    50.01 | Using index condition; Using temporary; Using filesort |
|  1 | SIMPLE      | ci    | NULL       | eq_ref | uk_candidate_id_target_id                                                                                                        | uk_candidate_id_target_id                 | 8       | const,wowo.c.id |     1 |    33.33 | Using where; Not exists; Using index                   |
|  1 | SIMPLE      | p     | NULL       | ref    | idx_candidate_profile_2_candidate_id                                                                                             | idx_candidate_profile_2_candidate_id      | 4       | wowo.c.id       |     1 |   100.00 | NULL                                                   |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+---------+-----------------+-------+----------+--------------------------------------------------------+

I realize, because of the sum operation, I can't create a proper index. And the query used 3-4 seconds.

I want to seek another way that could speed up the query.
Would "Pre-Calculated" values be the way? meaning with 2 genders and 5 interests, it would have 17 columns of variations.

Or is there another approach that could be used. thanks!


Updated, Oct 31: Add the desired outcome. Current query and EXPLAIN result

Best Answer

If all you have is 0 or 1, then TINYINT UNSIGNED should suffice (and be half the size).

Are you selecting all the 'matching' users? Or just 'counting' how many there are?

If selecting all that match, I would consider using a TINYINT UNSIGNED because it has 8 bits and they could represent gender, the interests, and is_active. Then apply a mask to select the bits that need to be looked at, and an '=' to test them all at one time.

I see no need for the interests to be in separate column. Instead, have a single column with

interests ENUM ('a', 'b', ...)
INDEX(interests, gender)
INDEX(gender, interests)

WHERE gender = 1 AND interests = 'a'
WHERE interests = 'd'
(etc -- covering the 4 cases you mentioned)

Please provide complete queries, and/or sample output.