Thesql need help optimizing query

indexMySQLperformancetuning

I have a table containing servers logs, and another table containing rules to match certain and drop them.

The logs tables contains around 1 million rows, and there are about 20-30 rules.

The query runs very slowly, I wonder is there any way I can make it run faster. I tried adding indexes to logs.message, but it does not help, I also read that you cannot index a "LIKE" column.

I am a total newbie to database, so please forgive me if I am missing any important concepts. Thanks in advance.

CREATE TABLE `logs` (
  `log_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `criticality` varchar(255) NOT NULL,
  `hostname` varchar(255) NOT NULL,
  `source` varchar(255) NOT NULL,
  `message` varchar(4096) NOT NULL,
  `record_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `record_by` varchar(255) NOT NULL,
  PRIMARY KEY (`log_id`),
  KEY `idx_message` (`message`(255))
) ENGINE=InnoDB AUTO_INCREMENT=233523 DEFAULT CHARSET=utf8$$

CREATE TABLE `rules` (
  `rule_id` int(11) NOT NULL AUTO_INCREMENT,
  `content` varchar(255) NOT NULL,
  `type` enum('MATCH','DROP') NOT NULL DEFAULT 'DROP',
  PRIMARY KEY (`rule_id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8$$

select hostname, criticality, source, message, record_date
from eventlog.logs l1
where not exists ( 
    SELECT l.message, r.rule_id
    FROM eventlog.logs l,
         eventlog.rules r
    where l.message like r.content
      and l.log_id = l1.log_id
      and r.type = 'DROP'
) and (criticality = 'High'
        or criticality = 'Medium')
  and record_date > sysdate() - Interval 2 Day
order by l1.message;

UPDATE 1
explain results, it took around 10 seconds to finish the query.

+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
| id | select_type        | table | type   | possible_keys | key     | key_len | ref                |  rows   | Extra                       |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
|  1 | PRIMARY            | l1    | ALL    | NULL          | NULL    | NULL    | NULL               | 2101642 | Using where; Using filesort |
|  2 | DEPENDENT SUBQUERY | r     | ALL    | NULL          | NULL    | NULL    | NULL               |      16 | Using where                 |
|  2 | DEPENDENT SUBQUERY | l     | eq_ref | PRIMARY       | PRIMARY | 8       | eventlog.l1.log_id |       1 | Using where                 |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+

Best Answer

I agree with @ypercubeᵀᴹ, the comma join in the exists clause can be removed:

select hostname, criticality, source, message, record_date
from eventlog.logs l1
where not exists ( 
    SELECT 1
    FROM eventlog.rules r
    where l1.message like r.content
      and r.type = 'DROP'
) 
and criticality in ('High', 'Medium')
and record_date > sysdate() - Interval 2 Day
order by l1.message;

I also replaced the selected columns in the subquery with 1 since it really does not matter what you select, and replaced the OR predicate with IN.