Mysql – Setting up indexes to support a flexible API search endpoint

database-designindexinnodbMySQLperformancequery-performance

I have a MySQL 5.7 table that holds activities that users can search through. It will end up holding less than 100,000 rows and inserts will be less than 50/day. This activities table has a structure of:

CREATE TABLE activities (
    activityID int AUTO_INCREMENT, 
    createdBy char, -- a unique AWS ID. eg: "us-east-1:128d0a74-c82f-4553-916d-90053e4a8b0f"
    active bool, 
    city varchar, 
    length int, 
    price  decimal, 
    latitude decimal, 
    longitude decimal,
    CONSTRAINT Activities_PK PRIMARY KEY (activityID)
);

The queries I will be making are constructed depending on the query string sent to the API endpoint. There are 3 'paths' the API can be searched by to return activities.

  • If createdBY is included the query string, Select ... where createdBy = ~some id~
  • If city is included, Select by city, and length if included, and mininmum price if included, and maximum price if included
  • If latitude and longitude are included, search between the 2 points

Indexes I have come up with so far:

  • createdBy, as a B-Tree. Should this be a HASH index because it's a unique identifier?

  • (city, length, price). Since this is a covering index, if the constructed Select query is only searching by city, would it still use the index even though it contains length and price?

    Also, since covering indexes start on the left, should I also include a (city, price, length) index?

  • (latitude, longitude)

Are these acceptable indexes? Should I include an index for the active column? All queries should only return active activities. My guess will be ~90% of rows would be active, while ~10% would be inactive. This answer suggests no.

Thanks so much! Any other suggestions are welcome!

Best Answer

InnoDB does not support HASH indexes; only B-TREE. You could certainly implement your own with another column called createdByHash. Use something like CRC32 as your hashing function to make it integer-based and save a few bytes per row. Index this column instead and make your queries SELECT .. FROM .. WHERE createdByHash = CRC32(<input>) AND createdBy = <input> (you have to have both parts because there could be collisions with CRC32)

Yes, MySQL uses multi-column indexes in left-to-right order. SELECT city, length, price FROM .. WHERE city = x will still use the index. If you have any range-based queries, the index should put those range'd columns to the right. An index on (A, B, C) and WHERE A=4 AND B>5 AND C=3 will only use the first two columns of that index.

If you are doing a simple greater-than/less-than on the lat/long, what you have will probably be OK. This one is tough since they are both ranged predicates. You may have to experiment on (lat, long) vs (long, lat) in your application.