Mysql – Joining two tables by the nearest match

join;MySQLperformancequery-performancesubquery

General problem: I want to do a left join on two huge tables, but there is no matching key – I want to join the left table with the "nearest" row in the right table. In this example I want to join using a timestamp, but I think one would have a similar (though more complex) problem with geographical coordinates.

To simplify a lot, I have one transaction table with a primary key id, timestamp, currency and amount – and one currency rate table with id, timestamp, currency and rate. Those tables are independent from each other, and they contain a lot of rows; assume that for some periods there are many transactions between each update of the currency rate, while in other periods there are many updates of the currency rates for each transaction. I want to join them so that there is one row for each transaction, with the closest matching currency rate. I have indexes on the timestamps.

So, here is a brief example of what I have and what I need:

transaction:
+---------------------+--------+
| timestamp           | amount |
+---------------------+--------+
| 2017-07-31 00:07:05 | 173.17 |
| 2017-08-02 15:29:11 | 136.57 |
| 2017-08-05 03:45:24 |  81.27 |
| 2017-08-05 03:46:47 |   48.1 |
| 2017-08-05 03:47:38 |  35.21 |
+---------------------+--------+

rate:
+---------------------+-------------------+
| timestamp           | rate              |
+---------------------+-------------------+
| 2017-07-31 00:04:49 |            9.2923 |
| 2017-07-31 01:37:59 | 9.313423370522607 |
| 2017-08-01 08:07:59 |             9.325 |
| 2017-08-01 16:52:23 |            9.3542 |
| 2017-08-01 21:07:09 | 9.357076262185192 |
| 2017-08-02 15:07:33 |  9.34936993421895 |
| 2017-08-02 17:48:45 | 9.357217848393876 |
| 2017-08-04 04:33:51 |  9.38690807898736 |
| 2017-08-04 08:13:01 | 9.383765889641701 |
| 2017-08-06 03:45:03 | 9.118193727124817 |
| 2017-08-06 04:15:01 | 9.353042966450854 |
| 2017-08-06 05:23:29 | 9.353042966450854 |
+---------------------+-------------------+

Want something like this:
+---------------------+--------+--------+
| timestamp           | amount | rate   |
+---------------------+--------+--------+
| 2017-07-31 00:07:05 | 173.17 | 9.2923 |
| 2017-08-02 15:29:11 | 136.57 | 9.3493 |
| 2017-08-05 03:45:24 |  81.27 | 9.3869 |
| 2017-08-05 03:46:47 |   48.1 | 9.3869 |
| 2017-08-05 03:47:38 |  35.21 | 9.3869 |
+---------------------+--------+--------+

(The production data is much more dense than this)

Post-edit: I did try to use a subquery, it failed spectacularly but probably because of a mistake from my side. Subquery seems to be the best way to solve this, so I've moved it from the question side into an answer.

Best Answer

I want to join the left table with the "nearest" row in the right table.

This is a KNN problem. For it you'll likely want a real database. I highly suggest you migrate to PostgreSQL and use PostGIS, where this can be solved using an index. Your answer has 0 to do with your problem. Your question is,

by the nearest match

Your answer is only looking at one way directionality, and it's only doing it for one row not a set: it's a correlated sub-query. You simply need to change course, or bounty a developer to include KNN in MySQL. MySQL is using Boost for GIS, so maybe someday someone will jump nearest(Point const &, unsigned) in the code base. I checked for ya, it's not there and there is no professional way to do what you want at any scale.

MySQL's GIS offering is the worst out of all the majors players. No one uses it. To give you an idea of scale,

  • MySQL has 225 questions on GIS.SE.
  • PostGIS has 5,771

It's a marketing point to say it supports a spatial index. See also

BTW, PostGIS also supports KNN on timestampg using btree_gist