Postgresql – Very slow query despite index being used

execution-planindexperformancepostgresqlpostgresql-performance

I have this table, created to hold tons of modbus measurements that comes from my devices:

CREATE TABLE public.tv_smartdevicemeasurement_modbus (
    measurement_id serial NOT NULL,
    insert_time timestamp with time zone NOT NULL,
    data jsonb NOT NULL,
    smart_device_id integer NOT NULL,
    CONSTRAINT tv_smartdevicemeasurement_modbus_pkey PRIMARY KEY (measurement_id),
    CONSTRAINT tv_smartdevicemeasur_smart_device_id_62c12ed0_fk_tv_smartd
        FOREIGN KEY (smart_device_id)
        REFERENCES public.tv_smartdevice_modbus (device_id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE INITIALLY DEFERRED
);

CREATE INDEX idxgin
    ON public.tv_smartdevicemeasurement_modbus USING gin (data);

CREATE INDEX tv_smartdev_insert__0ae03a_idx
    ON public.tv_smartdevicemeasurement_modbus USING btree (insert_time);

CREATE INDEX tv_smartdev_insert__1f0611_idx
    ON public.tv_smartdevicemeasurement_modbus USING btree (insert_time DESC);

CREATE INDEX tv_smartdevicemeasurement_modbus_smart_device_id_62c12ed0
    ON public.tv_smartdevicemeasurement_modbus USING btree (smart_device_id);

Now, the table has something like 200 million rows (and it's growing every day). I need to select with a simple datetime clause like this one:

SELECT data
FROM tv_smartdevicemeasurement_modbus
WHERE (tv_smartdevicemeasurement_modbus.insert_time > '2019-08-01'
AND tv_smartdevicemeasurement_modbus.insert_time < '2019-9-25')

But this simple and without hassle query takes up 8 seconds, as reported here with EXPLAIN ANALYZE VERBOSE.

The same with the track_io_timing enabled.

How to resolve this problem? Is there an effective long term solution?

Best Answer

There are two indices on (insert_time) and (insert_time DESC). B-tree indices can be scanned backwards at practically the same speed. And insert_time is NOT NULL, so there is no point whatsoever. Drop one of those in any case.

I made some assumptions where info is missing:

  • Current Postgres 12.
  • You are free to redesign the table and lock the table exclusively for some time.
  • The table is basically "append only".
  • New rows are added in order of insert_time.
  • The current table is physically out of order and/or bloated (leading to the index scan in your query plan instead of bitmap index scan)

I would rewrite the table like this:

BEGIN;
-- SET maintenance_work_mem = ???   -- set as high as you can afford temporarily

-- drop idx first to free space
DROP INDEX public.idxgin;     -- acquires exclusive lock on the table
DROP INDEX public.tv_smartdev_insert__0ae03a_idx;
DROP INDEX public.tv_smartdevicemeasurement_modbus_smart_device_id_62c12ed0;
-- DROP INDEX public.tv_smartdev_insert__1f0611_idx; -- might help SELECT, drop later

ALTER TABLE public.tv_smartdevicemeasurement_modbus ALTER measurement_id DROP DEFAULT;
DROP SEQUENCE public.tv_smartdevicemeasurement_modbus_measurement_id_seq;  -- drop owned sequence
ALTER TABLE public.tv_smartdevicemeasurement_modbus
         RENAME TO tv_smartdevicemeasurement_modbus_old;    -- free org. name

CREATE TABLE public.tv_smartdevicemeasurement_modbus (
   measurement_id   serial PRIMARY KEY    -- consider IDENTITY column instead, see below
 , smart_device_id  integer NOT NULL      -- reordering saves 4-8 bytes alignment padding per row
 , insert_time      timestamp with time zone NOT NULL
 , data             jsonb NOT NULL
 , CONSTRAINT tv_smartdevicemeasur_smart_device_id_62c12ed0_fk_tv_smartd
      FOREIGN KEY (smart_device_id)
      REFERENCES public.tv_smartdevice_modbus (device_id) DEFERRABLE INITIALLY DEFERRED  -- WHY deferrable / deferred? see below
);

INSERT INTO public.tv_smartdevicemeasurement_modbus
      (measurement_id, smart_device_id, insert_time, data)
SELECT measurement_id, smart_device_id, insert_time, data
FROM   public.tv_smartdevicemeasurement_modbus_old
ORDER  BY insert_time DESC;   -- CLUSTER while rewriting

DROP TABLE tv_smartdevicemeasurement_modbus_old;

CREATE INDEX tv_smartdev_insert__1f0611_idx ON public.tv_smartdevicemeasurement_modbus (insert_time DESC);
-- CREATE INDEX tv_smartdev_insert__0ae03a_idx ON public.tv_smartdevicemeasurement_modbus (insert_time);  -- nope!

CREATE INDEX idxgin ON public.tv_smartdevicemeasurement_modbus USING gin (data);  -- ?
CREATE INDEX tv_smartdevicemeasurement_modbus_smart_device_id_62c12ed0 ON public.tv_smartdevicemeasurement_modbus (smart_device_id); -- ?

COMMIT;

VACUUM ANALYZE public.tv_smartdevicemeasurement_modbus;

This rewrites the table saving some space (which also helps performance). Most importantly, it clusters the table according to your main index - and removes all possible bloat while being at it. This should help locality of data and make Postgres read fewer data pages. Unless your data column is big, you should see bitmap index scan in the query plan now. And if data is small, consider a covering index to get index-only scans. See:

How does the changed column order save space?

DEFERRABLE INITIALLY DEFERRED? This is rarely necessary. And cheaper without. See:

serial vs. IDENTITY? See:

Then query like this:

SELECT data
FROM   tv_smartdevicemeasurement_modbus
WHERE  insert_time >= '2019-08-01'  -- included
AND    insert_time <  '2019-09-25'; -- excluded

Be aware that these date literals are interpreted according to your local time zone setting. Consider true timestamptz input to be unambiguous. See:

These two indices are orthogonal to the query at hand: idxgin and tv_smartdevicemeasurement_modbus_smart_device_id_62c12ed0. Drop unless needed for unrelated stuff.