Consider
INDEX(templateinfoid, templateid, -- first, but in either order
gw_out_time,
messagetype, alias) -- last, for "covering" (either order)
Also, consider either of these (to avoid an extra sort):
group by messagetype, alias
order by messagetype, alias
or
group by alias, messagetype
order by alias, messagetype
CLUSTER
helps read performance where actual table rows have to be read (from the main relation). Index-only scans don't care about the physical order of rows in the underlying table. If your table layout is that simple, you might cover almost all SELECT
queries with index-only scans (Might require aggressive autovacuum settings for the table to support that.)
UPDATE
performance may even suffer from CLUSTER
. First of all, CLUSTER
rewrites the table without dead tuples, thereby removing "wiggle-room" for H.O.T. updates. A lower FILLFACTOR
might help, but not in your case. While updating indexed columns, H.O.T. updates are not possible anyway.
While you update millions of rows in one command (or transaction), clustered together, new row versions will hardly ever find room on the same data page, which makes it a bit more expensive to begin with.
And since you have multiple indexes including the updated columns, your updates are particularly expensive, since all of these indexes need to be updated as well.
Dead rows have to be reclaimed eventually. So new rows are not always written "en bloc", which inevitably leads to (slow in your case) fragmentation over time.
More efficient approach?
for each value of i
, there will be from 5 to 10 million rows.
...
All updates or deletes are "bulk", i.e. they involve all rows for a given value of i
.
Seems like you should not be updating millions of rows to begin with. Instead, create a second table (1:n) with a single row for every distinct i
in my_table
:
CREATE TABLE current_i (
org_i integer PRIMARY KEY
, current_i integer NOT NULL -- UNIQUE?
);
Should be a very small table, given your specifications. Now you only update a single row in current_i
instead of millions in my_table
, saving a lot of bloat and vacuuming in my_table
and its indexes - in addition to the much faster updates - which should make everything else faster, too.
To enforce referential integrity, you may want to add a FOREIGN KEY
constraint to my_table.i
:
REFERENCES current_i(i)
In queries, just join to current_i
. You might provide a VIEW
as drop-in replacement for the current table in SELECT
queries:
CREATE VIEW my_view AS
SELECT i.current_i AS i, m.x, m.y, m.z
FROM my_table m
JOIN current_i i ON i.org_i = m.i;
Should be much faster overall. All of this may come down to a simple case of normalization.
Best Answer
The lock taken is on the table level. It doesn't lock a specific WHERE clause.