MySQL – How to Get Top Users in Specific Tags

join;MySQL

I have a function like this:

CREATE DEFINER=`root`@`localhost` FUNCTION `user_top_tags`(`user_id` INT, `tags_num` TINYINT(1)) RETURNS varchar(50) CHARSET utf8mb4
    NO SQL
BEGIN
    DECLARE  top_tags varchar(50);

    SELECT substring_index(group_concat(x.name  ORDER BY x.tag_score DESC SEPARATOR ','), ',', tags_num) INTO top_tags
    FROM (
        SELECT t.name, sum(r.score) AS tag_score
        FROM reputations r
        JOIN qanda_tags qt ON qt.qanda_id = r.question_id
        JOIN tags t ON t.id = qt.tag_id
        WHERE r.owner_id = user_id
        GROUP BY t.name
    ) x;

    RETURN top_tags;
END

And I call it like this:

SELECT u.name, user_top_tags(u.id, 3) FROM users u WHERE 1;

And it returns a list of users with their top three tags. Something like this:

+--------+-----------------+
| Jack   | php,oop,mysql   |
| Martin | css,js,html     |
| Peter  | jquery,js,react |
+--------+-----------------+

Now I want to get the users which has active in specific tags. Something like top users page in SO (which is for javascript tag, but I want to get list of users in multiple tags, like IN ('css','html')).

Now should I do a join on the query? Or should I modify the function? Does anybody have any idea how can I do that?

Best Answer

Let's assume these are the settings for the problem. These will be the tables:

CREATE TABLE users
(
    user_id INTEGER NOT NULL PRIMARY KEY
) ;

CREATE TABLE tags
(
    id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(100)
) ;

CREATE TABLE qanda_tags
(
    question_id INTEGER,
    tag_id INTEGER NOT NULL REFERENCES tags(id)
) ;

CREATE TABLE reputations
(
    owner_id INTEGER NOT NULL REFERENCES users(user_id),
    question_id INTEGER,
    score INTEGER NOT NULL DEFAULT 0
) ;

And these are the values stored on the different tables:

INSERT INTO users (user_id) 
VALUES (1), (2) ;

INSERT INTO tags 
    (id, name)
VALUES
    (1, 'MySQL'),
    (2, 'PostgreSQL'),
    (3, 'MS SQL Server'),
    (4, 'Informix'),
    (5, 'MongoDB'),
    (6, 'CouchDB'),
    (7, 'Oracle'),
    (8, 'DB2'),
    (9, 'Sybase') ;

INSERT INTO qanda_tags
    (question_id, tag_id)
VALUES
    (1, 1),
    (1, 2),
    (2, 4),
    (2, 5),
    (3, 1),
    (3, 2),
    (3, 4);

INSERT INTO reputations
    (owner_id, question_id, score)
VALUES
    (1, 1, 3),
    (2, 2, 5),
    (1, 3, 8) ;

At this point, we could run this query...

SELECT 
    *
FROM
    reputations r
    JOIN qanda_tags qt 
        ON qt.question_id = r.question_id
    JOIN tags t 
        ON t.id = qt.tag_id ;

and get

owner_id | question_id | score | question_id | tag_id | id | name      
-------: | ----------: | ----: | ----------: | -----: | -: | :---------
       1 |           1 |     3 |           1 |      1 |  1 | MySQL     
       1 |           1 |     3 |           1 |      2 |  2 | PostgreSQL
       2 |           2 |     5 |           2 |      4 |  4 | Informix  
       2 |           2 |     5 |           2 |      5 |  5 | MongoDB   
       1 |           3 |     8 |           3 |      1 |  1 | MySQL     
       1 |           3 |     8 |           3 |      2 |  2 | PostgreSQL
       1 |           3 |     8 |           3 |      4 |  4 | Informix  

Your original function (slightly modified so it is no actually a function and can be played on at dbfiddle, and with some extra info), would do the following:

(these would be your function parameters)

SET @user_id = 1 ;
SET @tags_num = 4 ;

SELECT 
    substring_index(
        group_concat(name_and_score ORDER BY x.tag_score DESC, x.name ASC SEPARATOR ', ')
        , ', '
        , @tags_num
    ) AS top_tags_from_user
FROM 
   (SELECT 
        t.name, 
        sum(r.score) AS tag_score,
        concat(t.name, ' (', sum(r.score), ')') AS name_and_score
    FROM 
        reputations r
        JOIN qanda_tags qt 
            ON qt.question_id = r.question_id
        JOIN tags t 
            ON t.id = qt.tag_id
    WHERE 
        r.owner_id = @user_id
    GROUP BY 
        t.name
    ) AS x ;
| top_tags_from_user                        |
| :---------------------------------------- |
| MySQL (11), PostgreSQL (11), Informix (8) |

If you want to limit the tags that are checked, you can change the previous code just slightly:

SET @user_id = 1 ;
SET @tags_num = 4 ;
SET @tags_list = 'MySQL,Informix' ; 

SELECT 
    substring_index(
        group_concat(name_and_score ORDER BY x.tag_score DESC, x.name ASC SEPARATOR ', ')
        , ', '
        , @tags_num
    ) AS top_tags_from_user
FROM 
   (SELECT 
        t.name, 
        sum(r.score) AS tag_score,
        concat(t.name, ' (', sum(r.score), ')') AS name_and_score
    FROM 
        reputations r
        JOIN qanda_tags qt 
            ON qt.question_id = r.question_id
        JOIN tags t 
            ON t.id = qt.tag_id
               /* This is just what you need to add */
               AND (@tags_list IS NULL OR find_in_set(t.name, @tags_list))
    WHERE 
        r.owner_id = @user_id
    GROUP BY 
        t.name
    ) AS x ;

That will now return:

| top_tags_from_user       |
| :----------------------- |
| MySQL (11), Informix (8) |

The previous code will also work and choose all tags if you set @tags_list to NULL. In that case, the @tags_list IS NULL is true, and the JOIN condition is equivalent to the original case.

You can check everything at dbfiddle here