PostgreSQL Database Design – Field/Datatype for Searchable Address Components

addressdatabase-designdjangopostgresql

I'm using Django with Postgres, and I'm looking for a field/datatype to store address components that come from Google Places API in this format:

"address_components": [
  {"long_name": "27", "short_name": "27", "types": ["street_number"]},
  {"long_name": "Dromore Crescent", "short_name": "Dromore Crescent", "types": ["route"]},
  {"long_name": "Westdale South", "short_name": "Westdale South", "types": ["neighborhood", "political"]},
  {"long_name": "Hamilton", "short_name": "Hamilton", "types": ["locality", "political"]},
  {"long_name": "Hamilton Division", "short_name": "Hamilton Division", "types": ["administrative_area_level_2", "political"]},
  {"long_name": "Ontario", "short_name": "ON", "types": ["administrative_area_level_1", "political"]},
  {"long_name": "Canada", "short_name": "CA", "types": ["country", "political"]},
  {"long_name": "L8S 4A8", "short_name": "L8S 4A8", "types": ["postal_code"]}
],
"formatted_address": "27 Dromore Crescent, Hamilton, ON L8S 4A8, Canada",
"place_id": "ChIJtZiMplGbLIgRcGQ4Anc337s",

The goal is to allow users to see this row in their results if they search for westdale, L8S 4A8, l8s4a8, 27 dromore, hamilton, etc.

How should this data be saved?

  1. Save it as-is in jsonb? There's a lot of duplicate text, for instance 'long_name' and 'short_name' keep appearing over and over, wasting space.

  2. Strip all the keys and only keep the actual meat of the components in a varchar[], ex. ['27', 'Dromore Crescent', 'Westdale South', 'Hamilton', 'Hamilton Division', 'Ontario', 'Canada', 'L8S 4A8']?

  3. Normalize this data in layers of foreign-key related tables? Four tables are required to normalize this: address_to_component, component, component_type, component_to_component_type. (I actually started with this, but couldn't wrap my head around how to go about querying it.)

  4. Something else?

Best Answer

UPDATE: see this answer for a working example of the below

How would I store it? What you've got is

  • User data (formatted_address)
  • Normalized address breakdown (address_components)

I would store only the formatted_address or whatever the user provided. Why?

  • I don't trust Django or whatever you're integrating with
  • I want the ability to re-normalize addresses when new data comes out.

PostGIS is a PostgreSQL extension that provides address_standardizer. You can install standardize_address like this,

CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;

From there you can provide the formatted address to standardize_address

SELECT standardize_address( formatted_address )

This will return a composite type stdaddr. I may cache this against the version of the extension and the timestamp of the call to standardize_address.

The stdaddr composite type will also be much easier to geocode which you always seem to need at some point if you're going down this path.

All of that said, Django appears to be using Google Maps to normalize their addresses. If that works for you, you can run with it. I'm sure that's not a bad source of address normalization. How you store that, will depend entirely on how you want to query it. If you're even querying it at all and not just presenting a profile. If you're not JOINing based on any of that I think a new column django_google_maps jsonb is a great starting point.