I have a table containing servers logs, and another table containing rules to match certain and drop them.
The logs tables contains around 1 million rows, and there are about 20-30 rules.
The query runs very slowly, I wonder is there any way I can make it run faster. I tried adding indexes to logs.message, but it does not help, I also read that you cannot index a "LIKE" column.
I am a total newbie to database, so please forgive me if I am missing any important concepts. Thanks in advance.
CREATE TABLE `logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`criticality` varchar(255) NOT NULL,
`hostname` varchar(255) NOT NULL,
`source` varchar(255) NOT NULL,
`message` varchar(4096) NOT NULL,
`record_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`record_by` varchar(255) NOT NULL,
PRIMARY KEY (`log_id`),
KEY `idx_message` (`message`(255))
) ENGINE=InnoDB AUTO_INCREMENT=233523 DEFAULT CHARSET=utf8$$
CREATE TABLE `rules` (
`rule_id` int(11) NOT NULL AUTO_INCREMENT,
`content` varchar(255) NOT NULL,
`type` enum('MATCH','DROP') NOT NULL DEFAULT 'DROP',
PRIMARY KEY (`rule_id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8$$
select hostname, criticality, source, message, record_date
from eventlog.logs l1
where not exists (
SELECT l.message, r.rule_id
FROM eventlog.logs l,
eventlog.rules r
where l.message like r.content
and l.log_id = l1.log_id
and r.type = 'DROP'
) and (criticality = 'High'
or criticality = 'Medium')
and record_date > sysdate() - Interval 2 Day
order by l1.message;
UPDATE 1
explain results, it took around 10 seconds to finish the query.
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
| 1 | PRIMARY | l1 | ALL | NULL | NULL | NULL | NULL | 2101642 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | r | ALL | NULL | NULL | NULL | NULL | 16 | Using where |
| 2 | DEPENDENT SUBQUERY | l | eq_ref | PRIMARY | PRIMARY | 8 | eventlog.l1.log_id | 1 | Using where |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
Best Answer
I agree with @ypercubeᵀᴹ, the comma join in the exists clause can be removed:
I also replaced the selected columns in the subquery with 1 since it really does not matter what you select, and replaced the
OR
predicate withIN
.