I want to SELECT all rows from first_table
INNER JOIN second_table
for values where the column
in first_table
fall within range of values from two columns
in second_table
.
As example, I have created these two tables
First table:
CREATE TABLE location (loc_id int, gpstime int, lat_start float, lat_end float);
Second table:
CREATE TABLE segments( segment_id int, lat float,foot float,bike float,bus float,car float,metro float);
Then insert these values:
INSERT INTO location
SELECT 1, 12345, 41.10, 41.17 UNION ALL
SELECT 2, 99999, 42.00, 42.12 UNION ALL
SELECT 3,12346, 41.35, 41.30
INSERT INTO segments
SELECT 1, 41.15, 0, 0, 0.95, 0.025, 0 UNION ALL
SELECT 2, 42.19, 0.95, 0, 0, 0, 0.025 UNION ALL
SELECT 3, 41.33, 0,0, 0, 0, 1
The result set I want should return all rows for which value of column lat
in segments
table is a value that exists between lat_start
and lat_end
columns of location
table.
So I write the following query:
SELECT gpstime AS timestamp, lat_start, lat_end,
CASE
WHEN GREATEST(car, bike, bus, metro, foot) = car THEN 'car'
WHEN GREATEST(car, bike, bus, metro, foot) = bike THEN 'bike'
WHEN GREATEST(car, bike, bus, metro, foot) = bus THEN 'bus'
WHEN GREATEST(car, bike, bus, metro, foot) = metro THEN 'metro'
WHEN GREATEST(car, bike, bus, metro, foot) = foot THEN 'walking'
ELSE 'na' END AS mode
FROM location
INNER JOIN segments
ON lat BETWEEN lat_start AND lat_end;
Problem: The BETWEEN
key word returns TRUE only when check-value
fall in range (min, max)
and does not evaluate to (max, min)
. Clearly, the value of lat
in the "last" row of segments
(41.33) is a value that fall within the range of (lat_start, lat_end)
of location
(41.30-41.35). However, the BETWEEN clause excludes this row, as per returned the table below:
+-----------+-----------+---------+------+
| timestamp | lat_start | lat_end | mode |
+-----------+-----------+---------+------+
| 12345 | 41.1 | 41.17 | bus |
+-----------+-----------+---------+------+
How do I ensure all rows satisfying the criteria are returned?
Additional details
I posted this question yesterday, and following David answer, I was glad everything seems to be fixed. However, working on my database looks like there is still an error in the query as yet some rows are still missing. I reproduce minimal scenario of my work once again:
Table 1:
CREATE TABLE location (id int, lat float, lon float, gpstime int);
Table 2:
CREATE TABLE trips( trip_id int, lat_start float, lat_end float,
lon_start float, lon_end float,
travel_mode text);
Add rows:
INSERT INTO location
SELECT 1, 41.2, 8.3, 11111 UNION ALL --both lat and lon in the interval
SELECT 2, 43.3, 8.5, 22222 UNION ALL --only lat meets criteria
SELECT 3, 44.0, 8.5, 33333 --only lon meets criteria
INSERT INTO trips
SELECT 1, 41.1, 41.9, 8.0, 8.8, 'foot' UNION ALL
SELECT 2, 43.1, 43.5, 9.5, 8.9, 'bus' UNION ALL
SELECT 3, 42.1, 42.8, 8.7, 8.0, 'metro'
And then the query, using BETWEEN SYMMETRIC
:
SELECT trip_id, gpstime AS timestamp, lat, lon, travel_mode
FROM location
INNER JOIN trips
ON (lat BETWEEN SYMMETRIC lat_start AND lat_end)
AND (lon BETWEEN SYMMETRIC lon_start AND lon_end)
Results:
+---------+-----------+------+-----+-------------+
| trip_id | timestamp | lat | lon | travel_mode |
+---------+-----------+------+-----+-------------+
| 1 | 11111 | 41.2 | 8.3 | foot |
+---------+-----------+------+-----+-------------+
Yet, row 3
of location
table is not included in the result set. What I am missing here?
Note: Using BETWEEN SYMMETRIC
doesn't solve the problem yet. It appears value-checking is only applied to lat
column and NONE to lon
.
Db<>fiddle
I share with you, this db_fiddle I created to test the query. Cleary, all rows of location
table should be returned as rows
2 and 3 meet one criteria either.
Best Answer
When writing the answer below I wasn't aware of postgres'
BETWEEN SYMETRIC
, since pointed out by a_horse_with_no_name. That would be the simplest solution and I assume would be as easy for the query planner & runner to use indexes to serve as justBETWEEN
.In my defense, I'm mainly a SQL Server person... The longer answer is hopefully still useful for others in a similar position but not using prosgres, or where cross-DB compatibility is a requirement.
The ends of a
BETWEEN
comparison are always assumed to be low and higher, in that order. In most casesX BETWEEN Y AND Z
it is simple syntactic sugar forX>=Y AND X<=Z
and41.33>=41.35 AND 41.33<=41.30
obviously isn't true.You could use
CASE
to swap the values:or the less verbose version suggested by ypercube™ in the comments:
but both of those would be inefficient because your join predicate is no longer saragble so will not result in an indexed search. If the number of rows being considered in each case is small because of other filters then this will not be an issue.
Another option is to
UNION
two comparisons:(here the extra
WHERE
in each half stops you getting two rows for instances whenlat_start = lat_end
, so we can use the more efficientUNION ALL
instead of plainUNION
without worrying about duplicates in those cases)This means you get indexed lookups for both queries instead of a scan of
segments
.(Technically it would be possible for a query planner to decompose ypercube™'s example into the two lookups like the
UNION
example, but I'm not aware of a query planner that does this - there is a limit to how much thinking a QP can do before it starts taking too much time planning in the general case, compared to the time spent running the result.)You could change how you are storing your data and insist that the line/area is always defined bottom-left to top-right (so you have lat_left and lat_right instead of lat_start and lat_end) then your original query will word. If direction of travel is important then you can add that as a separate boolean value: 0 for left-to-right, 1 for right-to-left, or similar.
You might want to look into extensions that exist for postgres specifically designed for geometric (including geological, lat+long based) data types and related queries: https://postgis.net
They may do a chunk of your work for you, after the initial learning curve, especially if you start needing to consider more complex shapes.