Mysql – Optimize JOIN query with DISTINCT

MySQLoptimizationquery-performance

I have 2 tables, one for links (Links) and one for comments (Comments) where each comment contains a reference to a link (it can also point to NULL if it has no associated link, such comments we disregard for a moment). I want to order links such that links with newest comments are on top.

Currently, I do this:

SELECT DISTINCT linkId FROM Comments WHERE
  linkId IS NOT NULL AND
  dateAdded <= ?
ORDER BY id DESC
LIMIT 10

This is working very well (around 1ms with current 60k rows, I've tested it also with 800k rows, it still performs just as well)

However, now I'd like to classify the links into different categories and therefore I created a new column category in Links table and I want to order the links within their category. I tried running following query:

SELECT DISTINCT c.linkId FROM Comments c LEFT JOIN Links l ON c.linkId = l.id WHERE
  c.linkId IS NOT NULL AND
  c.dateAdded <= ? AND
  l.category = ?
ORDER BY c.id DESC
LIMIT 10

This makes the query execution time become ~1.5s with the table with 800k rows so it is obviously not scaling well.

The tables are defined as follows:

CREATE TABLE `Links` (
  `id` int(10) UNSIGNED NOT NULL,
  `category` int(10) UNSIGNED NOT NULL,
  `url` varchar(2083) CHARACTER SET utf8mb4 NOT NULL,
  `dateAdded` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `Links`
  ADD PRIMARY KEY (`id`),
  ADD KEY `category` (`category`);

CREATE TABLE `Comments` (
  `id` int(10) UNSIGNED NOT NULL,
  `linkId` int(10) UNSIGNED DEFAULT NULL,
  `content` varchar(2000) CHARACTER SET utf8mb4 NOT NULL,
  `dateAdded` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `Comments`
  ADD PRIMARY KEY (`id`),
  ADD KEY `idx2` (`linkId`,`dateAdded`);
ALTER TABLE `Comments`
  ADD CONSTRAINT `fk_link_id` FOREIGN KEY (`linkId`) REFERENCES `Links` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;

Is there any way how this query can be optimized?

I guess creating category column in Comments table would help but that could lead to inconsistencies (i.e. link pointing to one category but its links pointing to another category).

Best Answer

Because of filtering on two tables, it is not possible for a single index to handle all the WHERE, to handle the ORDER BY, much less optimize theLIMIT 10`.

The LEFT JOIN seems to be a JOIN.

This link on comments may help: INDEX(dateAdded, linkId, id) (in that order).

If you want to discuss further, please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ...