PostgreSQL Query Performance – How to Return a Query Result Only if It Isn’t NULL

performanceplpgsqlpostgresqlquery-performancespatial

I'm writing a PL/pgSQL function that creates a cursor for a query that I need to check whether it returns something.

What I'm doing is this:

  1. Run the query
  2. Check whether it returns something.
  3. If it doesn't, double a parameter and run the query again.
  4. Else, return all rows from the query.

I found that checking if the query returned something with a cursor was the best choice since it's a very long query (joining 5 tables and a lot of columns) and a SELECT ... INTO didn't seemed right, because I'd have to create a TYPE since the query has columns from one table and a column for a distance calculation.

The problem is that I'm currently using the cursor only to check if the query returned something inside a loop where I open it and close it. Once the query returns something, I exit the loop and return the query. I can tell right away that this is an ugly workaround for what I need. Maybe someone can help me with this issue.

Here's some code that shows what I'm currently doing.

CREATE FUNCTION store_distance(
    latitude double precision,
    longitude double precision,
    radius double precision,
    tries integer
)
RETURNS TABLE(
    store_id store.id%type,
    store_name store.name%type,
    distance double precision
)
AS
$$
DECLARE
    cur_stores CURSOR FOR
        SELECT
            store.id,
            store.name,
            get_distance(latitude, longitude, store.latitude, store.longitude) distance
        FROM
            store
        WHERE
            store.latitude BETWEEN (latitude - radius) AND (latitude + radius)
            AND store.longitude BETWEEN (longitude - radius) AND (longitude + radius)
        ORDER BY
            distance ASC;
    count int := 0;
    storerow RECORD;
BEGIN
    LOOP
        IF count = tries THEN
            EXIT;
        END IF;
        OPEN cur_stores;
        FETCH cur_stores INTO storerow;
        IF FOUND THEN
            EXIT;
        END IF;
        radius := radius * 2;
        count := count + 1;
        CLOSE cur_stores;
    END LOOP;
    RETURN QUERY
    SELECT
        store.id,
        store.name,
        get_distance(latitude, longitude, store.latitude, store.longitude) distance
    FROM
        store
    WHERE
        store.latitude BETWEEN (latitude - radius) AND (latitude + radius)
        AND store.longitude BETWEEN (longitude - radius) AND (longitude + radius)
    ORDER BY
        distance ASC;
END;
$$ LANGUAGE PLPGSQL;

So my purpose is to give coordinates, radius and the number of tries and to try and find stores within that search box. If no stores are found, I double the radius and try again until something is returned by the query or until the number of tries are reached.
The RETURN TABLE part is basically because I want to return the distance, so RETURN SETOF store was no use.

Best Answer

I don't think you need a cursor here at all. To shorten your code, you could just use a view. To improve performance, a materialized view should get you furthest. Postgres 9.3 has built-in features, but you can easily implemented it in older versions yourself.

Consider this simplified form:

CREATE FUNCTION store_distance(_lat    double precision
                              ,_long   double precision
                              ,_radius double precision
                              ,_tries  integer)
RETURNS TABLE(
   store_id   store.id%type
  ,store_name store.name%type
  ,distance   double precision) AS
$func$
DECLARE
   _ct  int   := 0;
   _pos point := point(_lat, _long);
BEGIN
   LOOP
      EXIT WHEN _ct >= _tries
           OR   EXISTS (
            SELECT 1 FROM store s
            WHERE  point(s.latitude, s.longitude) <@ circle(_pos, _radius));

      _radius := _radius * 2;
      _ct     := _ct + 1;
   END LOOP;

   RETURN QUERY
   SELECT s.id, s.name
         ,get_distance(_lat, _long, s.latitude, s.longitude)
   FROM   store s
   WHERE  point(s.latitude, s.longitude) <@ circle(_pos, _radius);
   ORDER  BY 3;
END
$func$ LANGUAGE plpgsql STRICT;

I made the function STRICT to disallow NULL input, which could result in an endless loop.

Note how I use circles with the "Contained" operator <@ instead of a boxes. One would assume calculations to be slightly more expensive than with boxes, but it hardly matters at all once you support your query with a GiST index like:

CREATE INDEX store_point_gist_idx ON store
USING gist (point(latitude, longitude));

You might consider to store lat / lon as point to begin with and replace the index on an expression with a simpler one on the column. Works either way, just make sure the query matches the index so it gets used. Big difference for big tables.

You may be interested in this closely related answer on SO I posted last year - with a lot more explanation and links.