Mysql – index thesql concatenated columns

indexMySQL

I have a table for author names with two fields, first_name & last_name, both varchar(55), both nullable (for crazy business logic reasons) although in reality last_name is unlikely to ever be null.

My where clause for the search screen contains:

WHERE CONCAT(a.first_name, " " , a.last_name) LIKE "%twain%"

so that "Twain" or "Mark Twain" can be searched on. The table has about 15,000 rows & is expected to gradually grow, but won't ever be more than double that, and not for years.

I understand that there are many other parts of my query that will affect this, but given just that information, how might I best index this?

If it would make a great difference in performance, making last_name not nullable is an option, but not first_name

TIA!

Best Answer

Two options I can think of. First, if you are on MyISAM or InnoDB 5.6+, you could store the concatenation in a separate field and use a FULLTEXT index on that field.

The other option is to index the first_name and last_name fields separately. Then change your query to:

WHERE a.first_name LIKE "twain%" OR a.last_name LIKE "twain%"

Removing the wildcard from the beginning will allow the indexes to be used.

Related Solutions

FBI on function with concatenated string

Are the columns nullable ? is the query a LIKE ? Is there an NLS issue ?

I'd expect a

upper("LAST_NAME"||','||"FIRST_NAME"||"MIDDLE_NAME"||"SUFFIX_NAME") = :bind

to use a index range scan

upper("LAST_NAME"||','||"FIRST_NAME"||"MIDDLE_NAME"||"SUFFIX_NAME") LIKE :bind

may use a index fast full scan or a table scan depending on whether columns from the table were likely to be required. If it thinks 1 in 5 rows will match and that for each of those it needs a column not in the index, then it would be slower to use the index+table lookup than a straight table scan.

It could be the table is very small and it isn't worth using the index.

Mysql – How to setup complex multi-column index for massive table

To answer your second question:

MySQL does not have a parallel query execution engine, so even if you partition the query, you are still single threaded. This will eventually kill your scale.

However, you could partition the table by visitor_id. This would allow you to run several queries (one per partition) in parallel, all of them form:

SELECT COUNT(DISTINCT visitor_id) 
FROM table WHERE location_id = # 
AND region_id = # 
AND action_id = # AND ts BETWEEN x AND y
AND visitor_id BETWEEN <partition_start> and <partition_end>

The output of these parallel queries (which you could store in a temp table as they run) is trivially combinable into the final result by simply adding the distinct counts together.

This is very similar to sharding, but instead of doing it across machines, you are doing it on the same table. By picking a good hash function to generate visitor_id (for example, a modulo or bit reversal if the original id is generated with a AUTO_INCREMENT) you can ensure that all partitions are approximately equal sized.

The reason you want to partition by visitor_id and not one of the other columns is that it makes the DISTINCT additive across partitions. For example, consider a table with two partitions. One holds visitor_id 0-99 in one holds and 100-199. You can now express two queries that can run in parallel:

INSERT INTO TempResult(visitor_id)
SELECT COUNT(DISTINCT visitor_id) 
    FROM table WHERE location_id = # 
    AND region_id = # 
    AND action_id = # AND ts BETWEEN x AND y
    AND visitor_id BETWEEN 0 and 99

And this one in parallel:

INSERT INTO TempResults (visitor_id)
SELECT COUNT(DISTINCT visitor_id) 
    FROM table WHERE location_id = # 
    AND region_id = # 
    AND action_id = # AND ts BETWEEN x AND y
    AND visitor_id BETWEEN 100 and 199

Because you know the visitor_id is not overlapping between partitions, the final result is:

SELECT SUM(visitor_id) FROM TempResults

You would of course need to pick the partition boundaries in such a way that partitions have approximately the same size.

I will let ypercube file the answer to the indexing question as this is the one that deserves the reward.

Best Answer

Related Solutions

FBI on function with concatenated string

Mysql – How to setup complex multi-column index for massive table

Related Question