Postgresql – How is it technically possible for PostgreSQL to keep small tables “in RAM”

postgresqlwrite-ahead-logging

I used to ask about performance and people would tell me that PostgreSQL keeps small-enough tables entirely in RAM automatically, as to not stress the HDD and thus enable super-fast reading/writing of "small tables", that is, with very few rows which more than easily fit in the available RAM (or what's assigned to PG).

This made sense to me, and made me very happy as I can efficiently communicate between my parallel-running scripts without stressing my poor HDD/SSD and clogging up all the resources.

But then it struck me: How is this possible? How can PG possibly keep a table all in RAM, regardless of how small it is, without losing data integrity?

If it doesn't write to the permanent storage all the time, how can it possibly recover if the power is suddenly cut, or there's a software crash, or some other disaster happens while it's running? It doesn't add up to me the more I think about it.

The only answer I can think of is: No, it doesn't actually do this. I don't see any possible way that PG could avoid losing data if it never (or rarely) writes the data to the permanent storage.

If this is possible, but only after enabling some kind of flag for the table, or setting, I'd like to know about that flag/setting. I'd like to enable this for tables which only contain "internal communication" data of the kind which has no value besides as a "common data storage" for actively running scripts.

Best Answer

This is possible due to Write-Ahead Logging (WAL):

Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records.

It's a central component of Postgres' architecture and turned on by default. You can configure a couple of parameters like instructed in the manual chapter WAL Configuration.

Also consider basics on Reliability in the manual.

Data alignment and storage size

Actually, the overhead per index tuple is 8 byte for the tuple header plus 4 byte for the item identifier.

We have three columns for the primary key:

PRIMARY KEY ("Timestamp" , "TimestampIndex" , "KeyTag")

"Timestamp"      timestamp (8 bytes)
"TimestampIndex" smallint  (2 bytes)
"KeyTag"         integer   (4 bytes)

Results in:

 4 bytes for item identifier in the page header (not counting towards multiple of 8 bytes)

 8 bytes for the index tuple header
 8 bytes "Timestamp"
 2 bytes "TimestampIndex"
 2 bytes padding for data alignment
 4 bytes "KeyTag" 
 0 padding to the nearest multiple of 8 bytes
-----
28 bytes per index tuple; plus some bytes of overhead.

About measuring object size in this related answer:

Measure the size of a PostgreSQL table row

Order of columns in a multicolumn index

Read these two questions and answers to understand:

The way you have your index (primary key), you can retrieve rows without a sorting step, that's appealing, especially with LIMIT. But retrieving the rows seems extremely expensive.

Generally, in a multi-column index, "equality" columns should go first and "range" columns last:

Multicolumn index and performance

Therefore, try an additional index with reversed column order:

CREATE INDEX analogransition_mult_idx1
   ON "AnalogTransition" ("KeyTag", "TimestampIndex", "Timestamp");

It depends on data distribution. But with millions of row, even billion of rows this might be substantially faster.

Tuple size is 8 bytes bigger, due to data alignment & padding. If you are using this as plain index, you might try to drop the third column "Timestamp". May be a bit faster or not (since it might help with sorting).

You might want to keep both indexes. Depending on a number of factors, your original index may be preferable - in particular with a small LIMIT.

autovacuum and table statistics

Your table statistics need to be up to date. I am sure you have autovacuum running.

Since your table seems to be huge and statistics important for the right query plan, I would substantially increase the statistics target for relevant columns:

ALTER TABLE "AnalogTransition" ALTER "Timestamp" SET STATISTICS 1000;

... or even higher with billions of rows. Maximum is 10000, default is 100.

Do that for all columns involved in WHERE or ORDER BY clauses. Then run ANALYZE.

Table layout

While being at it, if you apply what you have learned about data alignment and padding, this optimized table layout should save some disk space and help performance a little (ignoring pk & fk):

CREATE TABLE "AnalogTransition"(
  "Timestamp" timestamp with time zone NOT NULL,
  "KeyTag" integer NOT NULL,
  "TimestampIndex" smallint NOT NULL,
  "TimestampQuality" smallint,
  "UpdateTimestamp" timestamp without time zone, -- (UTC)
  "QualityFlags" smallint,
  "Quality" boolean,
  "Value" numeric
);

`CLUSTER` / pg_repack / pg_squeeze

To optimize read performance for queries that use a certain index (be it your original one or my suggested alternative), you can rewrite the table in the physical order of the index. CLUSTER does that, but it's rather invasive and requires an exclusive lock for the duration of the operation.
pg_repack is a more sophisticated alternative that can do the same without exclusive lock on the table.
pg_squeeze is a later, similar tool (have not used it, yet).

This can help substantially with huge tables, since much fewer blocks of the table have to be read.

RAM

Generally, 2GB of physical RAM is just not enough to deal with billions of rows quickly. More RAM might go a long way - accompanied by adapted setting: obviously a bigger effective_cache_size to begin with.

Best Answer

Related Solutions

PostgreSQL – Why Did UPDATE Take 39 Hours?

PostgreSQL – Configuring for Read Performance

Data alignment and storage size

Order of columns in a multicolumn index

autovacuum and table statistics

Table layout

CLUSTER / pg_repack / pg_squeeze

RAM

Related Question

`CLUSTER` / pg_repack / pg_squeeze