PostgreSQL – Configuring for Read Performance

performancepostgresqlpostgresql-9.1query-performance

Our system writes a lots of data (kind of Big Data system). Write performance is good enough for our needs but read performance is really too slow.

The primary key (constraint) structure is similar for all our tables:

timestamp(Timestamp) ; index(smallint) ; key(integer).

A table can have millions of rows, even billions of rows, and a read request is usually for a specific period (timestamp / index) and tag. It's common to have a query that returns around 200k lines. Currently, we can read about 15k lines per second but we need to be 10 times faster. Is this possible and if so, how?

Note: PostgreSQL is packaged with our software, so the hardware is different from one client to another.

It is a VM used for testing. The VM's host is Windows Server 2008 R2 x64 with 24.0 GB of RAM.

Server Spec (Virtual Machine VMWare)

Server 2008 R2 x64
2.00 GB of memory
Intel Xeon W3520 @ 2.67GHz (2 cores)

postgresql.conf optimisations

shared_buffers = 512MB (default: 32MB)
effective_cache_size = 1024MB (default: 128MB)
checkpoint_segment = 32 (default: 3)
checkpoint_completion_target = 0.9 (default: 0.5)
default_statistics_target = 1000 (default: 100)
work_mem = 100MB (default: 1MB)
maintainance_work_mem = 256MB (default: 16MB)

Table Definition

CREATE TABLE "AnalogTransition"
(
  "KeyTag" integer NOT NULL,
  "Timestamp" timestamp with time zone NOT NULL,
  "TimestampQuality" smallint,
  "TimestampIndex" smallint NOT NULL,
  "Value" numeric,
  "Quality" boolean,
  "QualityFlags" smallint,
  "UpdateTimestamp" timestamp without time zone, -- (UTC)
  CONSTRAINT "PK_AnalogTransition" PRIMARY KEY ("Timestamp" , "TimestampIndex" , "KeyTag" ),
  CONSTRAINT "FK_AnalogTransition_Tag" FOREIGN KEY ("KeyTag")
      REFERENCES "Tag" ("Key") MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE,
  autovacuum_enabled=true
);

Query

The query takes about 30 seconds to execute in pgAdmin3, but we would like to have the same result under 5 seconds if possible.

SELECT 
    "AnalogTransition"."KeyTag", 
    "AnalogTransition"."Timestamp" AT TIME ZONE 'UTC', 
    "AnalogTransition"."TimestampQuality", 
    "AnalogTransition"."TimestampIndex", 
    "AnalogTransition"."Value", 
    "AnalogTransition"."Quality", 
    "AnalogTransition"."QualityFlags", 
    "AnalogTransition"."UpdateTimestamp"
FROM "AnalogTransition"
WHERE "AnalogTransition"."Timestamp" >= '2013-05-16 00:00:00.000' AND "AnalogTransition"."Timestamp" <= '2013-05-17 00:00:00.00' AND ("AnalogTransition"."KeyTag" = 56 OR "AnalogTransition"."KeyTag" = 57 OR "AnalogTransition"."KeyTag" = 58 OR "AnalogTransition"."KeyTag" = 59 OR "AnalogTransition"."KeyTag" = 60)
ORDER BY "AnalogTransition"."Timestamp" DESC, "AnalogTransition"."TimestampIndex" DESC
LIMIT 500000;

Explain 1

"Limit  (cost=0.00..125668.31 rows=500000 width=33) (actual time=2.193..3241.319 rows=500000 loops=1)"
"  Buffers: shared hit=190147"
"  ->  Index Scan Backward using "PK_AnalogTransition" on "AnalogTransition"  (cost=0.00..389244.53 rows=1548698 width=33) (actual time=2.187..1893.283 rows=500000 loops=1)"
"        Index Cond: (("Timestamp" >= '2013-05-16 01:00:00-04'::timestamp with time zone) AND ("Timestamp" <= '2013-05-16 15:00:00-04'::timestamp with time zone))"
"        Filter: (("KeyTag" = 56) OR ("KeyTag" = 57) OR ("KeyTag" = 58) OR ("KeyTag" = 59) OR ("KeyTag" = 60))"
"        Buffers: shared hit=190147"
"Total runtime: 3863.028 ms"

Explain 2

In my latest test, it took 7 minutes to select my data! See below:

"Limit  (cost=0.00..313554.08 rows=250001 width=35) (actual time=0.040..410721.033 rows=250001 loops=1)"
"  ->  Index Scan using "PK_AnalogTransition" on "AnalogTransition"  (cost=0.00..971400.46 rows=774511 width=35) (actual time=0.037..410088.960 rows=250001 loops=1)"
"        Index Cond: (("Timestamp" >= '2013-05-22 20:00:00-04'::timestamp with time zone) AND ("Timestamp" <= '2013-05-24 20:00:00-04'::timestamp with time zone) AND ("KeyTag" = 16))"
"Total runtime: 411044.175 ms"

Best Answer

Data alignment and storage size

Actually, the overhead per index tuple is 8 byte for the tuple header plus 4 byte for the item identifier.

Related:

We have three columns for the primary key:

PRIMARY KEY ("Timestamp" , "TimestampIndex" , "KeyTag")

"Timestamp"      timestamp (8 bytes)
"TimestampIndex" smallint  (2 bytes)
"KeyTag"         integer   (4 bytes)

Results in:

 4 bytes for item identifier in the page header (not counting towards multiple of 8 bytes)

 8 bytes for the index tuple header
 8 bytes "Timestamp"
 2 bytes "TimestampIndex"
 2 bytes padding for data alignment
 4 bytes "KeyTag" 
 0 padding to the nearest multiple of 8 bytes
-----
28 bytes per index tuple; plus some bytes of overhead.

About measuring object size in this related answer:

Order of columns in a multicolumn index

Read these two questions and answers to understand:

The way you have your index (primary key), you can retrieve rows without a sorting step, that's appealing, especially with LIMIT. But retrieving the rows seems extremely expensive.

Generally, in a multi-column index, "equality" columns should go first and "range" columns last:

Therefore, try an additional index with reversed column order:

CREATE INDEX analogransition_mult_idx1
   ON "AnalogTransition" ("KeyTag", "TimestampIndex", "Timestamp");

It depends on data distribution. But with millions of row, even billion of rows this might be substantially faster.

Tuple size is 8 bytes bigger, due to data alignment & padding. If you are using this as plain index, you might try to drop the third column "Timestamp". May be a bit faster or not (since it might help with sorting).

You might want to keep both indexes. Depending on a number of factors, your original index may be preferable - in particular with a small LIMIT.

autovacuum and table statistics

Your table statistics need to be up to date. I am sure you have autovacuum running.

Since your table seems to be huge and statistics important for the right query plan, I would substantially increase the statistics target for relevant columns:

ALTER TABLE "AnalogTransition" ALTER "Timestamp" SET STATISTICS 1000;

... or even higher with billions of rows. Maximum is 10000, default is 100.

Do that for all columns involved in WHERE or ORDER BY clauses. Then run ANALYZE.

Table layout

While being at it, if you apply what you have learned about data alignment and padding, this optimized table layout should save some disk space and help performance a little (ignoring pk & fk):

CREATE TABLE "AnalogTransition"(
  "Timestamp" timestamp with time zone NOT NULL,
  "KeyTag" integer NOT NULL,
  "TimestampIndex" smallint NOT NULL,
  "TimestampQuality" smallint,
  "UpdateTimestamp" timestamp without time zone, -- (UTC)
  "QualityFlags" smallint,
  "Quality" boolean,
  "Value" numeric
);

CLUSTER / pg_repack / pg_squeeze

To optimize read performance for queries that use a certain index (be it your original one or my suggested alternative), you can rewrite the table in the physical order of the index. CLUSTER does that, but it's rather invasive and requires an exclusive lock for the duration of the operation.
pg_repack is a more sophisticated alternative that can do the same without exclusive lock on the table.
pg_squeeze is a later, similar tool (have not used it, yet).

This can help substantially with huge tables, since much fewer blocks of the table have to be read.

RAM

Generally, 2GB of physical RAM is just not enough to deal with billions of rows quickly. More RAM might go a long way - accompanied by adapted setting: obviously a bigger effective_cache_size to begin with.