Postgresql – Postgres: Are secondary indexes included in ACID

acidperformancepostgresqlpostgresql-performance

Are non-unique indexes / indices covered under the Consistency clause in aCid? (same for other attributes of an index that do not place constraints on the data) I am seeing certain performance issues (benefits, actually) in Postgres that make me wonder if they are.

Given that indexes / indices are not first-class objects (i.e. you can't access them directly in Postgres, nor can you request their use), I see no reason at all why Postgres would be REQUIRED to support this. I can find no definition of ACID that says "indexes have to absolutely complete and not be hacked up before the transaction can finish".

Under certain conditions that do not place restrictions on the insert (such as the index not being unique), the index could essentially be "invalidated" (i.e. "don't use it until I'm finished reindexing"), or flags can be set that says "the index does not cover the following ranges".

If Postgres played this trick, the copy from operator could be made exceedingly swift (which is what I am seeing), similarly for massive insert counts in a transaction.

I'm not just making this up…

While Redshift is a bad example, Amazon weasels out of Consistency by playing tricks with how it stores the (one and only) sort key (essentially a primary index-ish construct in Redshift). Until one performs a vacuum command, the primary key just keeps getting worse and worse and your database starts becoming a black hole: queries go in, but no results come out.

Clearly, an internalized vacuum regimen would prevent the Redshift silliness that often occurs during mass imports.

Best Answer

Are non-unique indexes / indices covered under the Consistency clause in aCid?

Yes. Any violation of that would be considered a bug in PostgreSQL.

The docs you quoted are cases where postgres might have to temporarily scan the heap instead of doing an index-only scan or otherwise do extra work to get a consistent result.

For example, both BRIN and GIN indexes accumulate batches of pending changes and then do batch updates. When the indexes are used in queries this queue is also scanned to make sure that a current, up-to-date and consistent view is seen.

If an index is currently invalid it'll be skipped by the planner and won't be used by queries.

Redshift isn't really PostgreSQL, it just happens to share the same front-end and protocol. Drawing comparisons based on Redshift will typically just create confusion.

Related Solutions

Postgresql – Copy Postgres databases with indexes between instances

If you want to copy a complete PostgreSQL database within its cluster, the fastest method is to use it as TEMPLATE in a CREATE DATABASE statement. I quote the manual:

By default, the new database will be created by cloning the standard system database template1. A different template can be specified by writing TEMPLATE name. In particular, by writing TEMPLATE template0, you can create a virgin database containing only the standard objects predefined by your version of PostgreSQL. This is useful if you wish to avoid copying any installation-local objects that might have been added to template1.

CREATE DATABASE db_copy TEMPLATE db_org;

This effectively copies underlying files around like you tried manually. Except that is sets everything up to work correctly.

You may want to clean up your original before you do (depends):

VACUUM FULL ANALYZE;

PostgreSQL – Do Indexes Get Transferred with pg_restore

Do indexes get transferred with pg_restore?
I see other questions that have been answered in the negative (i.e. indexes do not get transferred over with the standard pg_restore)

That seems to be a misunderstanding. The index itself (containing all the data) is not in the dump. Just the commands to recreate it. So, indexes get "transferred", but really, they are recreated in pristine condition, without any bloat or dead tuples

Do I still need to reindex?

No. After the restore you have all indexes in perfect condition. No REINDEX needed. ANALYZE would make sense, though, as advised in the manual in the chapter "Restoring the Dump":

After restoring a backup, it is wise to run ANALYZE on each database so the query optimizer has useful statistics;

And finally:

Is 9.4 THAT much more efficient?

No, not generally. Well, the size of GIN indexes has been reduced substantially. But what you see is most probably the effect of removing bloat from all tables and indexes (including system tables).

Best Answer

Related Solutions

Postgresql – Copy Postgres databases with indexes between instances

PostgreSQL – Do Indexes Get Transferred with pg_restore

Related Question