Mysql – Table design comment_seen vs comment_unseen — migrate data from one to another

database-designmigrationMySQLmysql-5.5schema-migration

I have system that allows comments posted by users and tracks which users seen which comments.
The read/unread state is tracked within comments_seen table. Surely enough, it grow tremendously.

My goal is to move from comments_seen table to comments_unseen.
Read/Unread status is only tracked for comment author and users in admin group (eg. admins see all new comment on site and should react respectively, while users only see answers to their comments or comments to their orders).

Current comments_seen table is almost 36M rows, while it's current definition is as simple as

CREATE TABLE comments_seen
(
    user_id    int(10) NOT NULL,
    comment_id int(10) NOT NULL,
    CONSTRAINT user_id
        UNIQUE (user_id, comment_id)
)

The comments_unseen table could be very much the same.

What would be your suggestions on my steps to moving from one table design to another?
Any triggers to keep data consistent?
Any deployment/migration steps?
All queries in code could be rewritten at once, I guess, so time to support both tables can be minimized to, say, seconds or minutes.

The overall application load isn't very high and everything could be done in night times for most users to not affect performance or UX.
Any thought, suggestions and shared experience is very much appreciated.

The DB is MySQL 5.5

UPDATE 2021-04-01: As pointed out in comments 35M rows isn't huge, but the culprit must be the query.
So here's the slowest query (as reported by Percona's pt-query-digest):

SELECT
            SUM(`is_deleted` IS NOT TRUE) AS `total_count`
            , SUM(`parent_id` = 0) AS `branch_count`
            , SUM(`parent_id` = 0 AND `is_can_answer`) AS `opened_branch_count`
            , SUM(`parent_id` = 0 AND NOT `is_can_answer`) AS `closed_branch_count`
            , SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE) AS `new_count` , SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE AND `module` = 5) AS `order_comment_count` , SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE AND `module` = 4) AS `items_comment_count` 
        FROM (
                            (
                                SELECT DISTINCT
                                    5 AS `module`
                                    , 'm' AS `item_type`
                                    , `co`.`id` AS `item_id`
                                FROM `cart_orders` AS `co`
                                WHERE `co`.`author` = 65
                                OR `co`.`manager_id` = 65
                                
                            )
                            UNION
                            (
                                SELECT DISTINCT
                                    5 AS `module`
                                    , 'i' AS `item_type`
                                    , `pr`.`id` AS `item_id`
                                FROM `cart_orders` AS `co`
                                    JOIN `cart_package` AS `cp`
                                        ON `cp`.`order_id` = `co`.`id`
                                    JOIN `pictures_relations` AS `pr`
                                        ON `pr`.`module_id` = 5
                                            AND `pr`.`item_id` = `cp`.`id`
                                            AND `pr`.`item_type` = 'opkg'
                                            AND `pr`.`is_file` IS FALSE
                                WHERE `co`.`author` = 65
                                OR `co`.`manager_id` = 65
                                
                            )
                            UNION
                            (
                                SELECT DISTINCT
                                    4 AS `module`
                                    , 'm' AS `item_type`
                                    , `cat`.`id` AS `item_id`
                                FROM `catalog` AS `cat`
                                    INNER JOIN (
                                        `cart_positions` AS `cp`
                                        JOIN `cart_orders` AS `co`
                                            ON `cp`.`order_id` = `co`.`id`
                                    )
                                        ON `cp`.`cat_id` = `cat`.`id`
                                WHERE `co`.`author` = 65
                                OR `co`.`manager_id` = 65
                                
                            )
                            ) AS `source` JOIN `comments` AS `a` USING (`module`, `item_id`, `item_type`) 
            LEFT JOIN `comments_seen` AS `cs` ON `cs`.`comment_id` = `a`.`id` AND `cs`.`user_id` = 65 
        WHERE ((TRUE))

Execution time for the whole query, and it's UNIONed parts:

complete - 8s 451ms

1 - 775ms
2 - 1s 214ms
3 - 3s 623ms

2+3 - 4s 891ms
1+3 - 4s 314ms
1+2 - 1s 945ms

Clearly, there's only single JOIN by both comment_id and user_id columns, so the issue seem to be not in the comments_seen table as I initially thought, but rather in overall poor table design.

Best Answer

(Addressing first question)

Toss comment_unseen; instead, use a LEFT JOIN ... WHERE ... IS NULL or WHERE NOT EXISTS ( SELECT 1 FROM comment_seen WHERE ... )

For comment_seen, replace

CONSTRAINT user_id UNIQUE (user_id, comment_id)

with this (Note: a PK includes a uniqueness constraint):

PRIMARY KEY(user_id, comment_id)

If you need to look up which users have seen a comment, add

INDEX(comment_id, user_id)

There is no need to also make that UNIQUE; in fact, there is some harm.