Mysql – How to perform FullText search on numeric column in MySQL to meet user requirements

mariadbMySQLperformancequery-performance

Having below table:

CREATE TABLE `event` (
      `eventId` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
      `eventTime` bigint(20) NOT NULL COMMENT 'ex: 1431201865000 (epoch in milliseconds)',
      `plateNumber` varchar(10) NOT NULL COMMENT 'ex: 5849',
      `plateCodeId` SMALLINT(5) UNSIGNED NOT NULL COMMENT 'ex: 1 = A, 2 = D, 3 = Q, 4 = NY',
      PRIMARY KEY (`eventId`),
      KEY `eventTime` (`eventTime`),
      FULLTEXT INDEX `platenumberft` (`plateNumber`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

The User want to search plateNumber in form '%334%', knowing that's only supported by fulltext index, I wonder if its possible to avoid the usage of fulltext index since it create 6 tables and even I didn't find benchmarks highlight the impact of fulltext index on large table I am worried about the performance since the table above will hold over 1 billion records in which insertion rate is 150 records per second 24/7.

MySQL Docs recommends never to use string "varchar" data type to store numeric data, and since the platenumber always numeric, it should be created as:

`plateNumber` INT(10) UNSIGNED NOT NULL

Is it possible to query numeric field in search form '%334%' with speed equal to the fulltext index? and How?

Use Case: the user received information that the plate number contain digits '334' in the middle, so the first and last digits are unknown.

Best Answer

When searching for a numeric string in the middle, FULLTEXT is the wrong approach. LIKE is the only approach, and it requires a full scan of the plates. Live with it.

But... How often do you need to search for the middle of a string? Not often. If you have the left part of a string, LIKE '234%' works quite efficiently; it is a "range". But then you need a regular INDEX, not FULLTEXT.

FULLTEXT is aimed at words, not substrings, not full strings. Don't use it for licence plates.

Don't use INT for a plate number. It is not a "number" on which you perform arithmetic; that is what the MySQL Doc is talking about. The plate number is merely a string that often contains only digits, so use a VARCHAR with the minimal collation. In the 'west', plates are numbers and capital letters. So, COLLATION ascii_bin is optimal. However, that means fold to upper case both the plate numbers as they are stored. Also fold any query strings. At the same time, remove any punctuation (dash, heart, star, etc) that is not actually important to the comparison.

I cannot address the collation specifics for Bangladesh, but I suspect that utf8_bin would suffice.