Mysql – Slow performance when joining a small table and filtering out on a non-key column in MariaDB (MySQL)

join;mariadbMySQLquery-performance

I am fairly new to MariaDB and I am struggling with one issue that I cannot get to the bottom of it. This is the query:

SELECT SQL_NO_CACHE STRAIGHT_JOIN
    `c`.`Name` AS `CategoryName`, 
    `c`.`UrlSlug` AS `CategorySlug`, 
    `n`.`Description`, 
    IF(n.OriginalImageUrl IS NOT NULL, n.OriginalImageUrl, s.LogoUrl) AS `ImageUrl`, 
    `n`.`Link`, 
    `n`.`PublishedOn`, 
    `s`.`Name` AS `SourceName`, 
    `s`.`Url` AS `SourceWebsite`, 
   s.UrlSlug AS SourceUrlSlug,
    `n`.`Title`
FROM `NewsItems` AS `n`
INNER JOIN `NewsSources` AS `s` ON `n`.`NewsSourceId` = `s`.`Id`
LEFT JOIN `Categories` AS `c` ON `n`.`CategoryId` = `c`.`CategoryId`
WHERE s.UrlSlug = 'slug'
#WHERE s.Id = 52
ORDER BY `n`.`PublishedOn` DESC
LIMIT 50

NewsSources is a table with about 40 rows and NewsItems has ~1 million. Each news item belongs to one source and one source can have many items. I'm trying to get all items for a source identified by URL slug of the source.

  1. In case when I use STRAIGHT_JOIN and when I query for a source that has lots of news items, the query returns immediately.
    However, if I query for a source that has low number of items (~100) OR if I query for a URL slug that doesn't belong to any source (result set is 0 rows), the query runs for 12 seconds.

  2. In case when I remove STRAIGHT_JOIN, I see the opposite performance from the first case – it runs really slow when I query for a news source with many items and returns immediately for sources with low number of items or result set is 0, because the URL slug doesn't belong to any news source.

  3. In case when I query by news source ID (the commented out WHERE s.Id = 52), the result comes immediately, regardless of whether there are lots of items for that source or 0 items for that source.

I want to point out again that the NewsSources table contains only about 40 rows.

Here is the analyzer results for the query above: Explain Analyzer

What can I do to make this query to run fast always?

Here are tables and indexes definitions:

-- --------------------------------------------------------
-- Server version:               10.4.13-MariaDB-1:10.4.13+maria~bionic - mariadb.org binary distribution
-- Server OS:                    debian-linux-gnu
-- --------------------------------------------------------

-- Dumping structure for table Categories
CREATE TABLE IF NOT EXISTS `Categories` (
  `CategoryId` int(11) NOT NULL AUTO_INCREMENT,
  `Name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `Description` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `UrlSlug` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL,
  `CreatedOn` datetime(6) NOT NULL,
  `ModifiedOn` datetime(6) NOT NULL,
  PRIMARY KEY (`CategoryId`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


-- Dumping structure for table NewsItems
CREATE TABLE IF NOT EXISTS `NewsItems` (
  `Id` bigint(20) NOT NULL AUTO_INCREMENT,
  `NewsSourceId` int(11) NOT NULL,
  `Title` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Link` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Description` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `PublishedOn` datetime(6) NOT NULL,
  `GlobalId` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `CategoryId` int(11) DEFAULT NULL,
  PRIMARY KEY (`Id`),
  KEY `IX_NewsItems_CategoryId` (`CategoryId`),
  KEY `IX_NewsItems_NewsSourceId_GlobalId` (`NewsSourceId`,`GlobalId`),
  KEY `IX_NewsItems_PublishedOn` (`PublishedOn`),
  KEY `IX_NewsItems_NewsSourceId` (`NewsSourceId`),
  FULLTEXT KEY `Title` (`Title`,`Description`),
  CONSTRAINT `FK_NewsItems_Categories_CategoryId` FOREIGN KEY (`CategoryId`) REFERENCES `Categories` (`CategoryId`),
  CONSTRAINT `FK_NewsItems_NewsSources_NewsSourceId` FOREIGN KEY (`NewsSourceId`) REFERENCES `NewsSources` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=649802 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


-- Dumping structure for table NewsSources
CREATE TABLE IF NOT EXISTS `NewsSources` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `Name` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
  `Url` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
  `UrlSlug` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LogoUrl` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=55 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Best Answer

According to POINT - 3 :

In case when I query by news source ID (the commented out WHERE s.Id = 52), the result comes immediately, regardless of whether there are lots of items for that source or 0 items for that source.

This is possible because on using WHERE s.Id = 52 it using index from NewSources & NewITems table do check explain plan might be different then given.

Try Creating Below Index :

create index IDX_UrlSlug on NewsSources(UrlSlug);  

And optimize all three tables :


OPTIMIZE TABLE NewsSources;
OPTIMIZE TABLE NewsItems;
OPTIMIZE TABLE Categories;