PostgreSQL Locking Behavior w/ Partial Index

postgresql

For PostgreSQL, will create index foo on bar where condition=true, i.e. without concurrently, lock an entire table for a partial index?

Or will only the affected rows, i.e. where condition=true be locked?

I did not see it spelled out in https://www.postgresql.org/docs/9.6/sql-createindex.html.

Best Answer

The lock taken is on the table level. It doesn't lock a specific WHERE clause.

Related Solutions

Postgresql – How to count selectivity rows in PostgreSQL 8.2

Consider

INDEX(templateinfoid, templateid,  -- first, but in either order    
      gw_out_time,
      messagetype, alias)  -- last, for "covering" (either order)

Also, consider either of these (to avoid an extra sort):

group by messagetype, alias 
order by messagetype, alias

group by alias, messagetype
order by alias, messagetype

Postgresql – Is clustering a table beneficial, given this update pattern

CLUSTER helps read performance where actual table rows have to be read (from the main relation). Index-only scans don't care about the physical order of rows in the underlying table. If your table layout is that simple, you might cover almost all SELECT queries with index-only scans (Might require aggressive autovacuum settings for the table to support that.)

UPDATE performance may even suffer from CLUSTER. First of all, CLUSTER rewrites the table without dead tuples, thereby removing "wiggle-room" for H.O.T. updates. A lower FILLFACTOR might help, but not in your case. While updating indexed columns, H.O.T. updates are not possible anyway.

While you update millions of rows in one command (or transaction), clustered together, new row versions will hardly ever find room on the same data page, which makes it a bit more expensive to begin with.

And since you have multiple indexes including the updated columns, your updates are particularly expensive, since all of these indexes need to be updated as well.

Dead rows have to be reclaimed eventually. So new rows are not always written "en bloc", which inevitably leads to (slow in your case) fragmentation over time.

More efficient approach?

for each value of i, there will be from 5 to 10 million rows.
...
All updates or deletes are "bulk", i.e. they involve all rows for a given value of i.

Seems like you should not be updating millions of rows to begin with. Instead, create a second table (1:n) with a single row for every distinct i in my_table:

CREATE TABLE current_i (
   org_i     integer PRIMARY KEY
 , current_i integer NOT NULL  -- UNIQUE?
);

Should be a very small table, given your specifications. Now you only update a single row in current_i instead of millions in my_table, saving a lot of bloat and vacuuming in my_table and its indexes - in addition to the much faster updates - which should make everything else faster, too.

To enforce referential integrity, you may want to add a FOREIGN KEY constraint to my_table.i:

REFERENCES current_i(i)

In queries, just join to current_i. You might provide a VIEW as drop-in replacement for the current table in SELECT queries:

CREATE VIEW my_view AS
SELECT i.current_i AS i, m.x, m.y, m.z
FROM   my_table  m
JOIN   current_i i ON i.org_i = m.i;

Should be much faster overall. All of this may come down to a simple case of normalization.

Best Answer

Related Solutions

Postgresql – How to count selectivity rows in PostgreSQL 8.2

Postgresql – Is clustering a table beneficial, given this update pattern

More efficient approach?

Related Question