You half miss the connection between the two tables. The ...relid
columns must match, too:
SELECT attname, c.*
FROM pg_attribute a
JOIN pg_constraint c
ON attrelid = conrelid -- this was missing
AND attnum = ANY (conkey)
WHERE attrelid = 'test_table'::regclass;
Basic answers
Since you select a couple of big columns an index-only scan is probably not a viable option.
This code works (if no NULL values in data!)
While the column isn't defined NOT NULL
, add NULLS LAST
to the sort order to make it work in any case, even with NULL
values. Ideally, use the clause in the corresponding index as well:
SELECT <some big columns>
FROM my_table_
ORDER BY when_ DESC NULLS LAST
LIMIT 1;
Without any index on when_
column, does this statement require a full
scan of all rows?
Yes. Without index, there is no other option left. (Well, there is also table partitioning where an index on key columns(s) is not strictly required, and it could assist with partition pruning. But you would typically have an index on key columns there, too.)
With an index on when_
column, should I change this SQL to use some
other approach/strategy of query?
Basically, this is the perfect query. There are options in combination with advanced indexing:
Advanced technique
Assuming a NOT NULL
column. Else, add NULLS LAST
to index and queries as suggested above.
You have a constant influx of rows with later when_
. Assuming the latest _when
constantly increases and never (or rarely) decreases (latest rows deleted / updated), you can use a very small partial index.
Basic implementation:
Run your query once to retrieve the latest when_
, subtract a safe margin (to be safe against losing the latest rows) and create an IMMUTABLE
function based on it. Basically a "fake global constant":
CREATE OR REPLACE FUNCTION f_when_cutoff()
RETURNS timestamptz LANGUAGE sql COST 1 IMMUTABLE PARALLEL SAFE AS
$$SELECT timestamptz '2015-07-25 01:00+02'$$;
PARALLEL SAFE
only in Postgres 9.6 or later.
Create a partial index excluding older rows:
CREATE INDEX my_table_when_idx ON my_table_ (when_ DESC)
WHERE when_ > f_when_cutoff();
With millions of rows, the difference in size can be dramatic. And this only makes sense with a much smaller index. Just half the size or something would not cut it. Index access itself is not slowed much by a bigger index. It's mostly the sheer size of the index, which needs to be read and cached. (And possibly avoiding additional index writes, but hardly in your case.)
Use the function in all related queries. Include the same WHERE
condition (even if logically redundant) to convince the query planner the index is applicable. For the simple query:
SELECT <some big columns>
FROM my_table_
WHERE when_ > f_when_cutoff()
ORDER BY when_ DESC
LIMIT 1;
The size of the index grows with new (later) entries. Recreate the function with a later timestamp and REINDEX
from time to time with no or little concurrent access. Only reindex after a relevant number of rows has been added. A couple of thousand entries won't matter much. We are doing this to cut off millions.
The beauty of it: queries don't change.
Implementation with function to update the partial index automatically:
More general advice:
Best Answer
It very much depends on details of your setup and requirements.
Note that since Postgres 11, only adding a column with a volatile
DEFAULT
still triggers a table rewrite. Unfortunately, this is your case.If you have sufficient free space on disk - at least 110 % of
pg_size_pretty((pg_total_relation_size(tbl))
- and can afford a share lock for some time and an exclusive lock for a very short time, then create a new table including theuuid
column usingCREATE TABLE AS
. Why?The below code uses a function from the additional
uuid-oss
module.Lock the table against concurrent changes in
SHARE
mode (still allowing concurrent reads). Attempts to write to the table will wait and eventually fail. See below.Copy the whole table while populating the new column on the fly - possibly ordering rows favorably while being at it.
If you are going to reorder rows, be sure to set
work_mem
high enough to do the sort in RAM or as high as you can afford (just for your session, not globally).Then add constraints, foreign keys, indices, triggers etc. to the new table. When updating large portions of a table it is much faster to create indices from scratch than to add rows iteratively. Related advice in the manual.
When the new table is ready, drop the old and rename the new to make it a drop-in replacement. Only this last step acquires an exclusive lock on the old table for the rest of the transaction - which should be very short now.
It also requires that you delete any object depending on the table type (views, functions using the table type in the signature, ...) and recreate them afterwards.
Do it all in one transaction to avoid incomplete states.
This should be fastest. Any other method of updating in place has to rewrite the whole table as well, just in a more expensive fashion. You would only go that route if you don't have enough free space on disk or cannot afford to lock the whole table or generate errors for concurrent write attempts.
What happens to concurrent writes?
Other transaction (in other sessions) trying to
INSERT
/UPDATE
/DELETE
in the same table after your transaction has taken theSHARE
lock, will wait until the lock is released or a timeout kicks in, whichever comes first. They will fail either way, since the table they were trying to write to has been deleted from under them.The new table has a new table OID, but concurrent transaction have already resolved the table name to the OID of the previous table. When the lock is finally released, they try to lock the table themselves before writing to it and find that it's gone. Postgres will answer:
Where
123456
is the OID of the old table. You need to catch that exception and retry queries in your app code to avoid it.If you cannot afford that to happen, you have to keep your original table.
Keeping the existing table, alternative 1
Update in place (possibly running the update on small segments at a time) before you add the
NOT NULL
constraint. Adding a new column with NULL values and withoutNOT NULL
constraint is cheap.Since Postgres 9.2 you can also create a
CHECK
constraint withNOT VALID
:That allows you to update rows peu à peu - in multiple separate transactions. This avoids keeping row locks for too long and it also allows dead rows to be reused. (You'll have to run
VACUUM
manually if there is not enough time in between for autovacuum to kick in.) Finally, add theNOT NULL
constraint and remove theNOT VALID CHECK
constraint:Related answer discussing
NOT VALID
in more detail:Keeping the existing table, alternative 2
Prepare the new state in a temporary table,
TRUNCATE
the original and refill from the temp table. All in one transaction. You still need to take aSHARE
lock before preparing the new table to prevent losing concurrent writes.Details in these related answer on SO: