I have system that allows comments posted by users and tracks which users seen which comments.
The read/unread state is tracked within comments_seen
table. Surely enough, it grow tremendously.
My goal is to move from comments_seen
table to comments_unseen
.
Read/Unread status is only tracked for comment author and users in admin group (eg. admins see all new comment on site and should react respectively, while users only see answers to their comments or comments to their orders).
Current comments_seen
table is almost 36M rows, while it's current definition is as simple as
CREATE TABLE comments_seen
(
user_id int(10) NOT NULL,
comment_id int(10) NOT NULL,
CONSTRAINT user_id
UNIQUE (user_id, comment_id)
)
The comments_unseen
table could be very much the same.
What would be your suggestions on my steps to moving from one table design to another?
Any triggers to keep data consistent?
Any deployment/migration steps?
All queries in code could be rewritten at once, I guess, so time to support both tables can be minimized to, say, seconds or minutes.
The overall application load isn't very high and everything could be done in night times for most users to not affect performance or UX.
Any thought, suggestions and shared experience is very much appreciated.
The DB is MySQL 5.5
UPDATE 2021-04-01: As pointed out in comments 35M rows isn't huge, but the culprit must be the query.
So here's the slowest query (as reported by Percona's pt-query-digest):
SELECT
SUM(`is_deleted` IS NOT TRUE) AS `total_count`
, SUM(`parent_id` = 0) AS `branch_count`
, SUM(`parent_id` = 0 AND `is_can_answer`) AS `opened_branch_count`
, SUM(`parent_id` = 0 AND NOT `is_can_answer`) AS `closed_branch_count`
, SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE) AS `new_count` , SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE AND `module` = 5) AS `order_comment_count` , SUM(`cs`.`comment_id` IS NULL AND `is_deleted` IS NOT TRUE AND `module` = 4) AS `items_comment_count`
FROM (
(
SELECT DISTINCT
5 AS `module`
, 'm' AS `item_type`
, `co`.`id` AS `item_id`
FROM `cart_orders` AS `co`
WHERE `co`.`author` = 65
OR `co`.`manager_id` = 65
)
UNION
(
SELECT DISTINCT
5 AS `module`
, 'i' AS `item_type`
, `pr`.`id` AS `item_id`
FROM `cart_orders` AS `co`
JOIN `cart_package` AS `cp`
ON `cp`.`order_id` = `co`.`id`
JOIN `pictures_relations` AS `pr`
ON `pr`.`module_id` = 5
AND `pr`.`item_id` = `cp`.`id`
AND `pr`.`item_type` = 'opkg'
AND `pr`.`is_file` IS FALSE
WHERE `co`.`author` = 65
OR `co`.`manager_id` = 65
)
UNION
(
SELECT DISTINCT
4 AS `module`
, 'm' AS `item_type`
, `cat`.`id` AS `item_id`
FROM `catalog` AS `cat`
INNER JOIN (
`cart_positions` AS `cp`
JOIN `cart_orders` AS `co`
ON `cp`.`order_id` = `co`.`id`
)
ON `cp`.`cat_id` = `cat`.`id`
WHERE `co`.`author` = 65
OR `co`.`manager_id` = 65
)
) AS `source` JOIN `comments` AS `a` USING (`module`, `item_id`, `item_type`)
LEFT JOIN `comments_seen` AS `cs` ON `cs`.`comment_id` = `a`.`id` AND `cs`.`user_id` = 65
WHERE ((TRUE))
Execution time for the whole query, and it's UNION
ed parts:
complete - 8s 451ms
1 - 775ms
2 - 1s 214ms
3 - 3s 623ms
2+3 - 4s 891ms
1+3 - 4s 314ms
1+2 - 1s 945ms
Clearly, there's only single JOIN
by both comment_id
and user_id
columns, so the issue seem to be not in the comments_seen
table as I initially thought, but rather in overall poor table design.
Best Answer
(Addressing first question)
Toss
comment_unseen
; instead, use aLEFT JOIN ... WHERE ... IS NULL
orWHERE NOT EXISTS ( SELECT 1 FROM comment_seen WHERE ... )
For
comment_seen
, replacewith this (Note: a PK includes a uniqueness constraint):
If you need to look up which users have seen a comment, add
There is no need to also make that
UNIQUE
; in fact, there is some harm.