Database noob here, just curious to see if my design could benefit from any improvements.
My goal was to keep this as simple as possible: a user can create an account, two accounts can message each other.
That's it.
To achieve this, I've use the following schema:
CREATE TABLE account (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
email VARCHAR(255) NOT NULL,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
--
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
--
PRIMARY KEY (id),
UNIQUE (email)
);
CREATE TABLE message (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
to_id INT(10) UNSIGNED NOT NULL,
from_id INT(10) UNSIGNED NOT NULL,
body VARCHAR(500) NOT NULL,
seen BOOLEAN NOT NULL DEFAULT 0,
--
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
--
PRIMARY KEY (id)
);
Using this schema, I want to be able to run two queries:
- A snapshot. I want the message overview, similar to how Facebook does it; your most recent messages in each thread are listed.
- A thread. I want to show the thread of messages between two users in a list.
I achieved a paginated list of #1 with the following query:
SELECT message.*, `from`.*, `to`.*
FROM message
LEFT JOIN account AS `to`
ON `to`.id = message.to_id
LEFT JOIN account AS `from`
ON `from`.id = message.from_id
WHERE message.id IN(
SELECT MAX(m.id)
FROM message m
WHERE m.id < 999999999
AND m.to_id = 1
AND m.deleted_at IS NULL
GROUP BY GREATEST(m.to_id, m.from_id), LEAST(m.to_id, m.from_id)
)
ORDER BY message.seen, message.id DESC
LIMIT 100
The query above for user with ID 1 will show me a snapshot of latest messages received from every user.
I achieved a paginated list of #2 with the following query:
SELECT message.*, `from`.*, `to`.*
FROM message
LEFT JOIN account AS `to`
ON `to`.id = message.to_id
LEFT JOIN account AS `from`
ON `from`.id = message.from_id
WHERE message.id < 999999999
AND (message.to_id = 1 AND message.from_id = 2) OR (message.to_id = 2 AND message.from_id = 1)
AND message.deleted_at IS NULL
LIMIT 100
The query above will give me all messages between user with ID 1 and user with ID 2.
Any thoughts? I'd love to hear of some obvious areas of improvement, and some possible explanation as to why so I can learn from it.
Cheers!
Best Answer
deleted
andseen
get in the way of optimizationsGROUP BY GREATEST(m.to_id, m.from_id), LEAST(m.to_id, m.from_id)
seems strange. Wouldn't this suffice:GROUP BY m.to_id, m.from_id
?from_id
orto_id
, half the info is already known?m.id < 999999999
for?IN ( SELECT ... )
often optimizes poorly.OFFSET
not the better "remember where I left off".VARCHAR(255)
.SELECT *
; instead, spell out the columns that you really need.SELECTs
look like.LEFT JOIN account
assumes that the desiredaccount
might be missing. If so, there a data integrity problem. So switch toJOIN
; this may aid in optimizations.LIMIT 100
-- Isn't that too much to put on a UI page?AND (message.to_id = 1 AND message.from_id = 2) OR (message.to_id = 2 AND message.from_id = 1)
is hard to optimize. UseUNION
of twoSELECTs
instead ofOR
, and add (tentatively)INDEX(from_id, to_id)
. The reason is that the Optimizer has almost no techniques for handlingOR
; it simply ignores indexes and falls back on a table scan.This might work:
Here's the Changelog entry that hints at such:
----- 2013-12-03 5.7.3 Milestone 13 -- Functionality Added or Changed -- -----
The optimizer now is able to apply the range scan access method to queries of this form:
Previously, for range scans to be used it was necessary for the query to be written as:
For the optimizer to use a range scan, queries must satisfy these conditions:
Only IN predicates can be used, not NOT IN.
There may only be column references in the row constructor on the IN predicate's left hand side.
There must be more than one row constructor on the IN predicate's right hand side.
Row constructors on the IN predicate's right hand side must contain only runtime constants, which are either literals or local column references that are bound to constants during execution.
EXPLAIN output for applicable queries will change from full table or index scan to range scan. Changes are also visible by checking the values of the Handler_read_first, Handler_read_key, and Handler_read_next status variables.