Your problem is that when you add a new column to the_person
, its child, the_person_two
will have this field appended at the end of columns list (4th position), so after has_default
column. See:
db=> \d temp_person
Column | Type | Modifiers
-----------+-------------------+-----------------------------------------------------------------
person_id | integer | not null default nextval('temp_person_person_id_seq'::regclass)
name | character varying |
foo | text |
db=> \d temp_person_two
Column | Type | Modifiers
-------------+----------------------+-----------------------------------------------------------------
person_id | integer | not null default nextval('temp_person_person_id_seq'::regclass)
name | character varying |
has_default | character varying(4) | not null default 'en'::character varying
foo | text |
So, when you execute this:
INSERT INTO temp_person_two VALUES ( NEW.* );
PostgreSQL will actually understand that you want to insert on the first three columns of temp_person_two
(as NEW.*
will expand to three values), generating something similar to this:
INSERT INTO temp_person_two(person_id,name,has_default)
VALUES ( NEW.person_id, NEW.name, NEW.foo );
So, temp_person_two.has_default
will get the value of NEW.foo
, which is NULL
in your case.
The solution is to simply expand the column names:
INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.person_id, NEW.name, NEW.foo );
or, you could also use this:
INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.* );
But this is weak, as any changes on column positions may break your statements, so I'd recommend the first one.
EDIT:
So the conclusion and the lesson learned here is:
Always explicitly type the names of the columns and the values when issuing an INSERT command, in fact, when issuing any SQL command at all... =D
This will save you a lot of time solving problems like that in future.
Replacing NULL
with a surrogate value may or may not help a little. NULL
handling is a bit more expensive in indexes but, on the other hand, it's typically smaller in storage, which also impacts performance.
I expect a much greater impact from this: flip the order of index expressions:
CREATE INDEX ON big_table (val, time DESC)
WHERE flag & 2::smallint = 2::smallint;
Rule of thumb: index for equality first — then for ranges. See:
To your consolation: val = ANY('{-1,5}')
burns down to being syntax shorthand for (val = -1 OR val = 5)
, which is hardly better than (val IS NULL OR val = 5)
. (The more important factor is number of rows for NULL
vs. -1
- the same in your case if you just replace NULL
with -1
.).
Also consider upgrading to a current version of Postgres. 9.5 is getting old, there have been various performance improvements for big tables.
To return only few columns, an index-only scan would be a possible optimization, but you need to return most of your 21 columns according to comments.
To save 8 bytes per row needlessly lost to alignment padding, reorder the columns of your demo table like this:
CREATE TABLE big_table (
flag bigint NOT NULL,
time timestamp with timezone NOT NULL,
id integer PRIMARY KEY,
val int
);
Smaller is faster overall. Now, that's obviously just a demo table, but the same principles apply to your actual table. See:
Sort
For a single val
, Postgres can return pre-sorted data from the index directly. But for more than one value it has to merge equally many sorted sub-sets (one sorted set for val IS NULL
, another one for val = 5
in your example), so another sort step on top of index access is inevitable. Pre-sorted sets from the index still can make it cheaper - and you need sorted index tuples in any case. The actual query plan also depends on the chosen index access method. It's trivial to return pre-sorted data from an index scan (or index-only scan). Not so much for a bitmap index scan.
Special case: very few NULL
values, used all the time
Since you mentioned only a handful of NULL values for millions of rows I would add a tailored index for that special case:
CREATE INDEX ON big_table (time DESC)
WHERE flag & 2::smallint = 2::smallint
AND val IS NULL;
Maybe even append all other columns of interest to this very tiny special index to get an index-only scan. (Depends on preconditions.) Even more so in Postgres 11 or later with a true covering index using the INCLUDE
clause. Results from this and the other index are merged in a BitmapOr
node, just like you see for multiple subset from the same index now. Postgres has precise estimates for the special case and it becomes completely irrelevant whether the special case is NULL or -1 or whatever. (Not that it mattered all that much to begin with.) See:
Best Answer
There is no such mechanism in PostgreSQL.
However, you can still avoid the excessive effects of such a table change.
The following statement acquires an access exclusive lock on the table for the duration of the statement/transaction:
This statement changes the catalog, then rewrites the whole table so that the new column contains the default value in all rows. If the table has many rows and being accessed frequently enough, this would cause some temporary problems.
To avoid it, try to hold the exclusive lock for as short as possible:
As this is basically only a (actually two) change to the catalog (no data change happens), it will complete pretty fast. Then depending on your needs and table usage, you can update the new column to the default in one step or in batches, and when finished, set the column to
NOT NULL
.Update about a wish coming true: PostgreSQL 11 will have this feature. See https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/ for more.