Postgresql – PostGis getting rid of the lat long from table

gist-indexnearest neighborpostgispostgresql

This question is related to this question :
Order by distance

Now my tables have changed a bit. Table cafes now doesn't have latitude, longitude columns instead there are table addresses which contains this information.

I followed the advice from the accepted accepted answer :

  1. Created the geometry geography type column called lonlat in the addresses table

  2. Added index CREATE INDEX addresses_lonlat_gist ON cafes USING gist(lonlat)

  3. Updated lonlat from the values in lattitude and longitude columns already present in addresses
    UPDATE addresses SET lonlat = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);

So when looking for nearby cafes from cafe my query looks like this (Erwins query):

SELECT a.cafe_id, c.id, ST_Distance(t.x, lonlat) AS distance 
FROM cafes 
  inner join addresses a on cafes.id = a.cafe_id
    , (SELECT ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)')) AS t(x)
WHERE ST_DWithin(t.x, lonlat, 4828)
ORDER BY distance;

I will never search for nearby cafe(s) from geo points. I will always search for nearby cafes from one cafe.

So my intention is not to get nearby cafes from points -76.000000 39.000000, but rather to find nearby cafe(s) from cafe with id x which is existing cafe.

My question is are the lat/long columns redundant here? Should I remove them?
Can I get nearby cafes based on my geometry column lonlat?

Question update:

This is my query now, as you wrote below, with explain analyze :

explain analyze SELECT c.id, a.postcode, ST_Distance(t.lonlat, a.lonlat) AS dt
FROM   cafes c
JOIN   addresses     a ON a.cafe_id = c.id
    , (SELECT a0.lonlat FROM addresses a0 WHERE a0.cafe_id = 10) t
WHERE ST_DWithin(t.lonlat, a.lonlat, 4828.03)
and c.id != 10
ORDER  BY dt;

This is the output :

"Sort  (cost=29.85..29.86 rows=1 width=124) (actual time=7.690..7.752 rows=609 loops=1)"
"  Sort Key: (_st_distance(a0.lonlat, a.lonlat, 0::double precision, true))"
"  Sort Method: quicksort  Memory: 68kB"
"  ->  Nested Loop  (cost=4.30..29.84 rows=1 width=124) (actual time=0.641..7.298 rows=609 loops=1)"
"        ->  Nested Loop  (cost=4.30..29.12 rows=1 width=124) (actual time=0.546..2.653 rows=614 loops=1)"
"              ->  Index Scan using index_addresses_on_cafe_id on addresses a0  (cost=0.00..8.28 rows=1 width=56) (actual time=0.007..0.009 rows=1 loops=1)"
"                    Index Cond: (cafe_id = 10)"
"              ->  Bitmap Heap Scan on addresses a  (cost=4.30..20.83 rows=1 width=68) (actual time=0.536..2.478 rows=614 loops=1)"
"                    Recheck Cond: (lonlat && _st_expand(a0.lonlat, 4828::double precision))"
"                    Filter: ((a0.lonlat && _st_expand(lonlat, 4828::double precision)) AND _st_dwithin(a0.lonlat, lonlat, 4828::double precision, true))"
"                    Rows Removed by Filter: 15"
"                    ->  Bitmap Index Scan on index_addresses_on_lonlat  (cost=0.00..4.30 rows=4 width=0) (actual time=0.306..0.306 rows=629 loops=1)"
"                          Index Cond: (lonlat && _st_expand(a0.lonlat, 4828::double precision))"
"        ->  Index Only Scan using cafes_pkey on cafes u  (cost=0.00..0.46 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=614)"
"              Index Cond: (id = a.cafe_id)"
"              Filter: (id <> 10)"
"              Rows Removed by Filter: 0"
"              Heap Fetches: 610"

It seems like it's not picking up the index on the geography column.
Here is table definition :

                                       Table "public.addresses"
     Column     |            Type             |                       Modifiers                        
----------------+-----------------------------+--------------------------------------------------------
 id             | integer                     | not null default nextval('addresses_id_seq'::regclass)
 address1       | character varying(255)      | 
 address2       | character varying(255)      | 
 town           | character varying(255)      | 
 county         | character varying(255)      | 
 country        | character varying(255)      | 
 postcode       | character varying(255)      | 
 latitude       | double precision            | 
 longitude      | double precision            | 
 cafe_id        | integer                     | 
 type           | character varying(255)      | 
 created_at     | timestamp without time zone | not null
 updated_at     | timestamp without time zone | not null
 state          | character varying(255)      | default 'auto_verified'::character varying
 verified_by_id | integer                     | 
 verified_at    | timestamp without time zone | 
 lonlat         | geography                   | 
Indexes:
    "addresses_pkey" PRIMARY KEY, btree (id)
    "index_addresses_on_latitude" btree (latitude)
    "index_addresses_on_longitude" btree (longitude)
    "index_addresses_on_lonlat" gist (lonlat)
    "index_addresses_on_state" btree (state)
    "index_addresses_on_cafe_id" btree (cafe_id)

Postgis version info (SELECT PostGIS_full_version();):

"POSTGIS="2.1.1 r12113" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26" LIBXML="2.7.8" RASTER"

Best Answer

The point of reference comes from the cafe in the center, so you can use a subquery to retrieve it from the addresses table instead of the manual input:

SELECT c.*, a.*, ST_Distance(t.lonlat, a.lonlat) AS distance -- pick columns you need
FROM   addresses a
JOIN   cafes     c ON c.id = a.cafe_id
    , (SELECT lonlat
       FROM   addresses
       WHERE  cafe_id = 10
       AND    type = 'SearchAddress') t -- center cafe
WHERE  ST_DWithin(t.lonlat, a.lonlat, 4828)
ORDER  BY distance;
  • Table alias was missing: cafes c.
  • Makes no sense to have a.cafe_id, c.id in the SELECT list, since those are the same by definition. Pick the columns you actually need.

  • The columns type or state shouldn't be type text (or character varying(255)), I suggest an enum for a hand full of types or a adr_type look-up table and adr_type_id int REFERENCES adr_type (adr_type_Id in addresses.

  • Either way, I also suggest a partial, covering index for a faster look-up of the center coordinate. Building on "There can only be one of those addresses with that type.":

    CREATE addresses_search_idx ON addresses (cafe_id, lonlat)
    WHERE type = 'SearchAddress';
    
  • A lot more details discussed in chat (don't know for how long the link will be good).