Mysql full text search with other indexed data

full-text-searchMySQLperformance

After getting some mysql optimization advice in this question. Was able to get the performance to the acceptable levels (when performing lookups using EventType and event_log_domain_id combo). To recap my database table is around 11mil records, and these are the columns(relevant ones) :

id| EventTime | EventType | EventName ... | event_log_domain_id

Now however on top of filtering(exact match) on EventType and event_log_domain_id I need to add EventName into the mix and perform wildcard search.

After looking at mysql docs and options available the fulltext search option seemed very promising, however running this one :

SELECT COUNT(*) FROM `access_logs` 
WHERE `access_logs`.`event_log_domain_id` IN (8, 59, 920, 1054, 2227)
  AND (MATCH( EventName ) AGAINST( 'business' ))
  AND `access_logs`.`EventType` IN (1, 5)

And this one :

SELECT  `access_logs`.* FROM `access_logs` 
WHERE `access_logs`.`event_log_domain_id` IN (8, 59, 920, 1054, 2227) 
  AND (MATCH( EventName ) AGAINST( 'business' ))
  AND `access_logs`.`EventType` IN (1, 5) 
ORDER BY id desc LIMIT 50

It seemed like Full-Text index is used no matter what, and I have confirmed that first by running explain which looks something like this:

id| select_type | table       | type     | possible_keys                                                                                                 | key            | key_len | ref  | rows | extra
1 | SIMPLE      | access_logs | fulltext | IndexEventType, event_log_domain_id, EventTypeAndeventlogdomainid,eventlogdomainidAndEventType,IndexEventName | IndexEventName | 0       | null | 1    | Using where; Using filesort

It takes about 95-ish seconds to execute either of the queries above. If I take out the full text search using event type the queries run within few hundred ms.

When I run the fulltext search on it's own like so :

SELECT count(*) FROM `access_logs` 
WHERE (MATCH( EventName ) AGAINST( 'business' ))

Executes roughly in 9seconds. Which is pretty high in itself but 'acceptable' given the number of rows.

Have tried tinkering with inner query alias like so :

SELECT
    access_logs.*
FROM
    inbound_access_logs access_logs
JOIN 
 (
   SELECT  `access_logs`.* FROM `access_logs` 
    WHERE `access_logs`.`event_log_domain_id` IN (8, 59, 920, 1054, 2227)
    AND`access_logs`.`EventType` IN (1, 5)
 ) AS filtered_query
ON access_logs.id = filtered_query.id
where access_logs.EventName like '%business%'

Which executes in few seconds, but if I use the same approach with inner query alias and use it with full text search.

SELECT
    access_logs.*
FROM
    inbound_access_logs access_logs
JOIN 
 (
   SELECT  `access_logs`.* FROM `access_logs` 
    WHERE `access_logs`.`event_log_domain_id` IN (8, 59, 920, 1054, 2227)
    AND`access_logs`.`EventType` IN (1, 5)
 ) AS filtered_query
ON access_logs.id = filtered_query.id
where MATCH( access_logs.EventName ) AGAINST( 'business' )

Again 8-9 seconds. Thinking that maybe I leave the like query in there, but I really wanted to optimize this that I don't have to touch it for a while.

Another option that I was thinking was to create a new table like so:

id | access_log_id | EventName

Would write to this table on data insert. On this table I would be having fulltext index on the EventName column, and I would join this table when doing lookups, was hoping this would make the query optimizer think 'better' given that it's two tables, hopefully using one index per table in the same search. Not sure if this is a good idea or not.

My question is: How do I go about optimizing query to filter on EventName with other columns I mentioned above? None of my approaches seems to be great

Update:
I have been experimenting with the inner query alias and like, this one :

SELECT
    access_logs.*
FROM
    inbound_access_logs access_logs
JOIN 
 (
   SELECT  `access_logs`.* FROM `access_logs` 
    WHERE `access_logs`.`event_log_domain_id` IN (8, 59, 920, 1054, 2227)
    AND`access_logs`.`EventType` IN (1, 5)
 ) AS filtered_query
ON access_logs.id = filtered_query.id
where access_logs.EventName like '%business%'

And time varies depending on the EventName and other conditions. It ranges from 1 to 60 seconds. Which is frustrating, getting some closer runtime would be better.

I am running on the AWS RDS using m3.medium machine, not sure if that makes a difference. Would be good to know what else I can try or how to build in more certainty. Because once I run the query it remains in the database cache and performs fast the second time.

UPDATE 1

Showing table create statement:

'CREATE TABLE `access_logs` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `EventTime` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `EventSize` int(10) unsigned NOT NULL DEFAULT ''0'',
  `EventScore` double DEFAULT NULL,
  `EventCustomer` varchar(255) DEFAULT NULL,
  `event_log_domain_id` smallint(6) DEFAULT NULL,
  `EventIdentifier` varchar(255) DEFAULT NULL,
  `EventType` tinyint(3) unsigned NOT NULL DEFAULT ''0'',
  `EventName` varchar(255) DEFAULT NULL,
  `utc` tinyint(1) DEFAULT ''0'',
  `job_queue_id` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `EventTime` (`EventTime`),
  KEY `IndexIdentifier` (`EventIdentifier`),
  KEY `IndexType` (`EventType`),
  KEY `IndexUserType` (`EventIdentifier`,`EventType`),
  KEY `event_log_domain_id` (`event_log_domain_id`),
  KEY `job_queue_id` (`job_queue_id`),
  KEY `EventTypeAndeventogdomainid` (`EventType`,`event_log_domain_id`),
  KEY `eventogdomainidAndEventType` (`event_log_domain_id`,`EventType`),
  FULLTEXT KEY `IndexEventName` (`EventName`)
) ENGINE=InnoDB AUTO_INCREMENT=123163064 DEFAULT CHARSET=latin1'

What we actually use this page for.

We have built a page that filters on these events, there are 5 filters:

  1. EventName (wildcard search %search_string%, not always provided)
  2. EventCustomer (forward match search_string%, not always provided)
  3. EventIdentifier (forward match search_string%, not always provided)
  4. EventType (exact match, one or more ids but not always provided)
  5. event_log_domain_id (exact match, one or more ids always provided)

It is possible for provide more than one filter, so you could provide name and customer for all log domains. or just event type for particular log domain etc.

We have a pagination on the page, so we do need to perform a count to get the row count.

Then we fetch 50/100 records at the time so we do limit too. We also provide default order, or give ability to order by few columns (EventCustomer, EventIdentifier, EventType and EventTime).

Best Answer

As I understand it, the FULLTEXT needs to be performed first. That is, your JOIN attempts make things worse.

access_logs.* probably has some TEXT columns, correct? (Please provide SHOW CREATE TABLE so I would have to guess so much.) That can be a lot to lug around. It may be better to fetch only id until you have done all the filtering, then do an extra JOIN to get *. id is lightweight; * may be bulky. And wasteful if you end up tossing most of the rows.

Give this a try:

SELECT  g.*
    FROM ( SELECT  id
            FROM  access_logs
            WHERE  MATCH(EventName) AGAINST('business' IN BOOLEAN MODE) 
         ) AS f
    JOIN  access_logs AS g USING(id)
    WHERE  g.domain_id` IN (8, 59, 920, 1054, 2227)
      AND  g.EventType IN (1, 5) 

The LIMIT version needs some more care. First filter down to the 50 ids, then fetch the rest of the columns.

SELECT i.*
    FROM ( SELECT  g.id
            FROM ( SELECT  id
                    FROM  access_logs
                    WHERE  MATCH(EventName) AGAINST('business' IN BOOLEAN MODE) 
                 ) AS f
            JOIN  access_logs AS g USING(id)
            WHERE  g.domain_id` IN (8, 59, 920, 1054, 2227)
              AND  g.EventType IN (1, 5) 
            ORDER BY id DESC
            LIMIT 50    -- Only here, not at the outer layer
             ) AS h
    JOIN access_logs AS i  USING(id)
    ORDER BY id DESC   -- Yes, this is needed a second time