Mysql – Does MySQL have a limitation of one index per query

indexMySQL

MySQL supports spatial index. However, all the spatial index can do is find points in a box. It can't do fancy stuff like find the nearest 1000 points, etc.

I think, and I don't know where I went wrong, that simply finding points in a box is easy – just do constraints:

x < xmax and x > xmin and y < ymax and y > ymin

Yes those are 4 constraints that use at least 2 columns.
The result is that things get too slow when there are 1.6 million rows.

But why? The x and y are sorted. The complexity should be log(n). Yet queries can take 60 seconds.

Using a spatial index shortens it to 1 second with some subquery trick.

What am I missing? Why does using a bunch of constraints in a query take time?

Even without a fancy algorithm it should be easy to quickly get the points in the box. Yet, I feel that the complexity must be O(n) at least.

Best Answer

There are multiple questions here.

To answer the title: MySQL can utilize more than one index per query. This is called index-merge.

However, it does not always happen that index-merge optimization is used.

"Normal" indexes are B/B+ Trees. Such trees provide with linear sorting. You can index x, and the index sorts x in ascending order. You can do the same for y. You can even index both (x, y) but you still get linear sorting -- not two dimensional one.

To get back to your other question, regarding the query

x < xmax and x > xmin and y < ymax and y > ymin

Let's assume then that we have an index on x and an index on y. The problem begins with the fact that these indexes do not share resources, nor information, nor at any way point to one another.

It's very easy to figure out all rows where x < xmax and x > xmin, using the index on x. But once you have these, you can't just take those rows to the y index and ask of it to "return rows from withing these rows I'm giving you and for which y < ymax and y > ymin. The reason is that the index onycan quickly lookup usingy` values. It does not know how to quickly lookup row #3984987.

The way index-merge works is by getting row ids from the x index, getting row ids from the y index (meanwhile the x rows are stored in a temporary buffer), and then intersect these. Note that intersection has nothing to do with indexes. It is a brute force operation (hopefully, achieved in linear time thanks to the fact that the two lists of rows are already sorted).

Now given such a query as yours, MySQL asks itself: what would be cheaper? Use just one index on x, then row by row filter those rows where y < ymax and y > ymin applies? Or perhaps use an index-merge, create a temporary buffer, work out a brute-force intersection?

The fact is that MySQL rarely chooses the index-merge optimization. I've been looking at many queries and their execution plans. Very rare. Perhaps there's someone there for whom this works on regular basis.

But looking at the X:Y coordinates problem, it's easy to see why this wouldn't work: there's just too many matching values for the x-range alone, and just too many values for the y-range alone for this to be a good plan.