There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)
But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3
in your second attempt has syntax and logical errors, but the basic idea is:
CREATE TABLE table3 (
"ID" bigint NOT NULL DEFAULT '0',
"DataID" bigint DEFAULT NULL,
"Address" numeric(20) DEFAULT NULL,
"Data" bigint DEFAULT NULL,
PRIMARY KEY ("ID"),
FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,
FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);
This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL
. If a new row is inserted, check DataID
and Address
: if they contain a non-NULL value (say 27856), then check Table1
for DataID
Λ™and Table2
for Address
. If there is no such value in those tables, then return an error." This last point which you've seen first:
ERROR: insert or update on table "Table3" violates foreign key constraint
"Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".
So simple: if there is no row in Table1
where DataID = 27856
, then you can't insert that row into Table3
.
If you need that row, you should first insert a row into Table1
with DataID = 27856
, and only then try to insert into Table3
. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.
And now about the other problems.
You define your PKs as
CREATE all_your_tables (
first_column NOT NULL DEFAULT '0',
[...]
PRIMARY KEY ("ID"),
A primary key means that all the items in it are different from each other, that is, the values are UNIQUE
. If you give a static DEFAULT
(like '0'
) to a UNIQUE
column, you will experience bad surprises all the time. This is what you got in your third error message.
Furthermore, '0'
means a text string, but not a number (bigint
or numeric
in your case). Use simply 0
instead (or don't use it at all, as I written above).
And a last point (I may be wrong here): in Table2
, your Address
field is set to numeric(20)
. At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3
like
ID DataID Address Data
123 3216547 1234567890454 654897564134569
Now that address happens to change to something other. How do you make your child row in Table3
follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.
Clarify ON CONFLICT DO UPDATE
behavior
Consider the manual here:
For each individual row proposed for insertion, either the insertion
proceeds, or, if an arbiter constraint or index specified by
conflict_target
is violated, the alternative conflict_action
is taken.
Bold emphasis mine. So you do not have to repeat predicates for columns included in the unique index in the WHERE
clause to the UPDATE
(the conflict_action
):
INSERT INTO test_upsert AS tu
(name , status, test_field , identifier, count)
VALUES ('shaun', 1 , 'test value', 'ident' , 1)
ON CONFLICT (name, status, test_field) DO UPDATE
SET count = tu.count + 1;
WHERE tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = 'test value'
The unique violation already establishes what your added WHERE
clause would enforce redundantly.
Clarify partial index
Add a WHERE
clause to make it an actual partial index like you mentioned yourself (but with inverted logic):
CREATE UNIQUE INDEX test_upsert_partial_idx
ON public.test_upsert (name, status)
WHERE test_field IS NULL; -- not: "is not null"
To use this partial index in your UPSERT you need a matching conflict_target
like @ypercube demonstrates:
ON CONFLICT (name, status) WHERE test_field IS NULL
Now the above partial index is inferred. However, as the manual also notes:
[...] a non-partial unique index (a unique index without a predicate) will
be inferred (and thus used by ON CONFLICT
) if such an index satisfying
every other criteria is available.
If you have an additional (or only) index on just (name, status)
it will (also) be used. An index on (name, status, test_field)
would explicitly not be inferred. This doesn't explain your problem, but may have added to the confusion while testing.
Solution
AIUI, none of the above solves your problem, yet. With the partial index, only special cases with matching NULL values would be caught. And other duplicate rows would either be inserted if you have no other matching unique indexes / constraints, or raise an exception if you do. I suppose that's not what you want. You write:
The composite key is made up of 20 columns, 10 of which can be nullable.
What exactly do you consider a duplicate? Postgres (according to the SQL standard) does not consider two NULL values to be equal. The manual:
In general, a unique constraint is violated if there is more than one
row in the table where the values of all of the columns included in
the constraint are equal. However, two null values are never
considered equal in this comparison. That means even in the presence
of a unique constraint it is possible to store duplicate rows that
contain a null value in at least one of the constrained columns. This
behavior conforms to the SQL standard, but we have heard that other
SQL databases might not follow this rule. So be careful when
developing applications that are intended to be portable.
Related:
I assume you want NULL
values in all 10 nullable columns to be considered equal. It is elegant & practical to cover a single nullable column with an additional partial index like demonstrated here:
But this gets out of hand quickly for more nullable columns. You'd need a partial index for every distinct combination of nullable columns. For just 2 of those that's 3 partial indexes for (a)
, (b)
and (a,b)
. The number is growing exponentially with 2^n - 1
. For your 10 nullable columns, to cover all possible combinations of NULL values, you'd already need 1023 partial indexes. No go.
The simple solution: replace NULL values and define involved columns NOT NULL
, and everything would work just fine with a simple UNIQUE
constraint.
If that's not an option I suggest an expression index with COALESCE
to replace NULL in the index:
CREATE UNIQUE INDEX test_upsert_solution_idx
ON test_upsert (name, status, COALESCE(test_field, ''));
The empty string (''
) is an obvious candidate for character types, but you can use any legal value that either never appears or can be folded with NULL according to your definition of "unique".
Then use this statement:
INSERT INTO test_upsert as tu(name,status,test_field,identifier, count)
VALUES ('shaun', 1, null , 'ident', 11) -- works with
, ('bob' , 2, 'test value', 'ident', 22) -- and without NULL
ON CONFLICT (name, status, COALESCE(test_field, '')) DO UPDATE -- match expr. index
SET count = COALESCE(tu.count + EXCLUDED.count, EXCLUDED.count, tu.count);
Like @ypercube I assume you actually want to add count
to the existing count. Since the column can be NULL, adding NULL would set the column NULL. If you define count NOT NULL
, you can simplify.
Another idea would be to just drop the conflict_target from the statement to cover all unique violations. Then you could define various unique indexes for a more sophisticated definition of what's supposed to be "unique". But that won't fly with ON CONFLICT DO UPDATE
. The manual once more:
For ON CONFLICT DO NOTHING
, it is optional to specify a
conflict_target; when omitted, conflicts with all usable constraints
(and unique indexes) are handled. For ON CONFLICT DO UPDATE
, a
conflict_target must be provided.
Best Answer
Insert not needed for to count:
And insert only non-conflicted records - it will be less expensive.