Mysql – Filtering a MySQL table on multiple columns, including one with spatial data

indexMySQLspatial

I have a MySQL table with linestrings:

CREATE TABLE edges (
  id int(11) unsigned NOT NULL AUTO_INCREMENT,
  geometry linestring NOT NULL,
  type_key varchar(255) NOT NULL,
  created_at datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  deleted_at datetime DEFAULT NULL,
  PRIMARY KEY (id),
  KEY type_key (type_key, deleted_at),
  SPATIAL KEY geometry (geometry)
) ENGINE=InnoDB AUTO_INCREMENT=130966 DEFAULT CHARSET=utf8mb4;

I want to plot a subset of them on a map. To do so, I run the following query (the ? are variables):

SELECT * 
FROM edges 
WHERE deleted_at IS NULL 
  AND type_key = ? 
  AND MBRIntersects(geometry, LineString(Point(?, ?), Point(?, ?)))

The issue I'm facing, is that MySQL does not allow for multi-column indices that include spatial columns (so the geometry column). It therefore has to choose: either use the type_key index, or use the spatial geometry index.

Which one is more efficient depends on the type_key chosen and the minimum bounding rectangle of the linestring constructed in the query. And in quite some cases, it's not efficient at all to only be able to use one of them: with about 100k rows in this table, it might very well take MySQL more than 500ms to compute that one such query results in 0 rows.

I googled quite a bit on this topic, but could not find any good solutions. Interesting results were:

  • partitioning (see eg. http://mysql.rjweb.org/doc.php/latlng), but the number of partitions you should make is quite limited (< 100, I read)
  • not using a spatial column at all: the geometry can be represented by a bounding box with four points. I could add all four as separate columns and add them to the index. That will help on one of them (as they are all inequality constraints), but will still be quite inefficient (because of the other 3)
  • migrating to other database software, eg. Postgres. I read that Postgres is able to use multiple indices in the same query (https://devcenter.heroku.com/articles/postgresql-indexes), but it's not clear to me how efficient that is. I'm not clear on the alternative: whether a spatial column and other columns can be combined in a single index. I spent some time in setting up a test environment with the same data, but gave up after an hour or so, as it took me too much time.

My question therefore is: how are others solving this issue?

Best Answer

You are saying that some type_keys are associated with thousands of rows; some with only a few?

An off-the-wall thought:

if ( SELECT id FROM edges
         WHERE type_key = ? AND deleted IS NULL
         LIMIT 100, 1 ) is NOT NULL
then
   SELECT * FROM edges ... FORCE INDEX(type_key) ...
else
   SELECT * FROM edges ... IGNORE INDEX(type_key) ...

Caveats:

  • The IF statements require either using a Stored Procedure or moving the IFs into your app code.
  • I don't know if the idea will work.
  • The choice of 100 needs tuning.