Postgresql – Postgres: Efficient schema for querying with one exact string match and two range matches

postgresql

The table I need to query:

CREATE TABLE regions (
    state text NOT NULL,
    zip_begin text NOT NULL,  -- 9-digit zip code
    zip_end text NOT NULL,
    date_begin date NOT NULL,
    date_end date,

    data ...,
)

There are ~50 states and between 0-5M rows per state. The zip and date ranges might overlap.

The following will be one of the most common queries in my OLTP application:

SELECT data
FROM regions
WHERE state = {state}
  AND {date} BETWEEN date_begin AND date_end
  AND {zip} BETWEEN zip_begin AND zip_end

This query usually yields one row, but may sometimes yield more.

From the Postgres docs, it sounds like a GiST index might do what I need, but I don't really understand how those work.

Best Answer

Don't dismiss the simple btree index until you try it. It surely won't scale well to a trillion rows, but you surely don't have a trillion rows. It might be good enough.

To use an gist index, you would need to treat the ranges explicitly as ranges, not as end points. It would probably be best to reformat your table in that format, but you can instead use expression indexes to reformat on the fly. But then your query will have to be written to match.

Some other complexities are that simple scalars don't have built in gist operators, so to put "state" into a gist index requires you to use the btree_gist extension; and there is not a built-in text range. Rather than creating a text-range, you can just cast the zip to int (which would probably be better done in the table in the first place), so something like this:

create index on regions using gist(
  state,
  daterange(date_begin,date_end,'[]'),
  int4range(zip_begin::int,zip_end::int,'[]')
);

And then the query would look like:

SELECT data
FROM regions
WHERE state = {state}
  AND {date}::date <@ daterange(date_begin,date_end,'[]')
  AND {zip}::int <@ int4range(zip_begin::int,zip_end::int,'[]')

~~Now I haven't tested this query on the gist index, because gist indexes are so ridiculously slow to build that I don't have one available yet to try it on.~~ It has finished building and I have tested it, and it runs and gives the correct answer with '0' left-padded 5 digit zip codes. With the range sizes I used, it is sometimes faster and some times slower than the btree index on a 10e6 row table, but was never markedly different.

Related Solutions

Postgresql – What does “GIN doesn’t support full index scans” mean

a word or two belonging to a pre-set, almost 'constant', list of words.

Have you considered scanning the texts as they are input and flagging them (possibly with a many-to-many link to another table containing your list of keywords)?

Postgresql – How to design a database for continents, countries, regions, cities and POIs

i suggest 2 options

Option 1: if you want a RBDMs, i think 2 tables will be sufficient

Table 1 (Lookup)
-LocationID
-Continent
-country
-region
-City
-Landmark (granularity)

in this Table since its a lookup so i suggest keeping it Denormalized, it will save you alot in your coding specially when joining tables and searching and reporting. and maintenance is easier, then 5 separate tables

the other table is the table to collect the data

Table 2 (Master)
-Tourist
-LocationID
-ExtraFields
-...

so in this table you collect info about tourists and all you have to do is store the locationid, which is your granularity, this way the Locationid, has the Landmark,city,region,country,continent Since the Landmark is definetly in a city and the city is in a country and the country is in a region.. (no need for 5 tables to keep them separate, there will be a big chance of invalid inputs or data if not maintained properly)

Option 2:

i suggest graph database, check this post at SO , about using Neo4J for GIS and an implementation of GIS by Neo4J

Best Answer

Related Solutions

Postgresql – What does “GIN doesn’t support full index scans” mean

Postgresql – How to design a database for continents, countries, regions, cities and POIs

Related Question