Are the columns nullable ? is the query a LIKE ? Is there an NLS issue ?
I'd expect a
upper("LAST_NAME"||','||"FIRST_NAME"||"MIDDLE_NAME"||"SUFFIX_NAME") = :bind
to use a index range scan
upper("LAST_NAME"||','||"FIRST_NAME"||"MIDDLE_NAME"||"SUFFIX_NAME") LIKE :bind
may use a index fast full scan or a table scan depending on whether columns from the table were likely to be required. If it thinks 1 in 5 rows will match and that for each of those it needs a column not in the index, then it would be slower to use the index+table lookup than a straight table scan.
It could be the table is very small and it isn't worth using the index.
To answer your second question:
MySQL does not have a parallel query execution engine, so even if you partition the query, you are still single threaded. This will eventually kill your scale.
However, you could partition the table by visitor_id
. This would allow you to run several queries (one per partition) in parallel, all of them form:
SELECT COUNT(DISTINCT visitor_id)
FROM table WHERE location_id = #
AND region_id = #
AND action_id = # AND ts BETWEEN x AND y
AND visitor_id BETWEEN <partition_start> and <partition_end>
The output of these parallel queries (which you could store in a temp table as they run) is trivially combinable into the final result by simply adding the distinct counts together.
This is very similar to sharding, but instead of doing it across machines, you are doing it on the same table. By picking a good hash function to generate visitor_id (for example, a modulo or bit reversal if the original id is generated with a AUTO_INCREMENT) you can ensure that all partitions are approximately equal sized.
The reason you want to partition by visitor_id
and not one of the other columns is that it makes the DISTINCT additive across partitions. For example, consider a table with two partitions. One holds visitor_id
0-99 in one holds and 100-199. You can now express two queries that can run in parallel:
INSERT INTO TempResult(visitor_id)
SELECT COUNT(DISTINCT visitor_id)
FROM table WHERE location_id = #
AND region_id = #
AND action_id = # AND ts BETWEEN x AND y
AND visitor_id BETWEEN 0 and 99
And this one in parallel:
INSERT INTO TempResults (visitor_id)
SELECT COUNT(DISTINCT visitor_id)
FROM table WHERE location_id = #
AND region_id = #
AND action_id = # AND ts BETWEEN x AND y
AND visitor_id BETWEEN 100 and 199
Because you know the visitor_id
is not overlapping between partitions, the final result is:
SELECT SUM(visitor_id) FROM TempResults
You would of course need to pick the partition boundaries in such a way that partitions have approximately the same size.
I will let ypercube file the answer to the indexing question as this is the one that deserves the reward.
Best Answer
Two options I can think of. First, if you are on MyISAM or InnoDB 5.6+, you could store the concatenation in a separate field and use a FULLTEXT index on that field.
The other option is to index the first_name and last_name fields separately. Then change your query to:
Removing the wildcard from the beginning will allow the indexes to be used.