The id
field in your ny_stations
table does not seem to be defined as a serial, so it is expected that pg_get_serial_sequence
will return nothing.
The duplicate you get relates to one of the records in your SELECT DISTINCT ... FROM ny_raw_trips ...
is returning two rows with the same id
:
SELECT start_station_id, COUNT(*) FROM (
SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
) a
GROUP BY start_station_id
HAVING COUNT(*) > 1;
You could list the rows that are introducing the duplication like this:
WITH src AS (
SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
)
SELECT *
FROM src
WHERE start_station_id IN (SELECT start_station_id FROM src GROUP BY start_station_id HAVING COUNT(*) > 1)
ORDER BY start_station_id;
Edit
Once you find the offending duplicates, should you consider that the first occurrence of each case is good enough (e.g. trivial differences in description or coordinate fields), you can use DISTINCT ON:
INSERT into ny_stations (id, name, latitude, longitude)
SELECT DISTINCT ON (start_station_id) start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
ORDER BY start_station_id;
It seems there is a UNIQUE INDEX on table A(user_id) and you're trying to assign an existing value to more than one row.
Have a look at the next example:
create table a (id int, user_id int, f_id int);
create table b (id int, f_id int);
create unique index A_user_id_key on a(user_id);
insert into a
values (1, 1, 11),
(2, 3, 22),
(3, 2, 33),
(4, 5, 44 );
insert into b
values (1, 11),
(3, 22),
(3, 33), --<<<< there is another row with user_id = 3
(4, 44);
When I try to update using your current query:
update a
set user_id = b.id
from b
where a.f_id = b.f_id;
It returns same error message:
ERROR: duplicate key value violates unique constraint "a_user_id_key"
DETAIL: Key (user_id)=(3) already exists.
You could solve by avoiding existing values:
update a
set user_id = b.id
from b
where a.f_id = b.f_id
and not exists(select 1 from a where a.user_id = b.id);
This is the result:
select * from a;
id | user_id | f_id
-: | ------: | ---:
1 | 1 | 11
2 | 3 | 22
3 | 2 | 33
4 | 4 | 44
db<>fiddle here
But I'll suggest to check wich are the duplicate rows using next query:
select a.*
from a
join b
on a.f_id = b.f_id
and exists(select 1 from a where user_id = b.id and f_id <> b.f_id);
id | user_id | f_id
-: | ------: | ---:
3 | 2 | 33
db<>fiddle here
Best Answer
In addition to what @Craig provided (and correcting some of it):
Effective Postgres 9.4,
UNIQUE
,PRIMARY KEY
andEXCLUDE
constraints are checked immediately after each row when definedNOT DEFERRABLE
. This is different from other kinds ofNOT DEFERRABLE
constraints (currently onlyREFERENCES
(foreign key)) which are checked after each statement. We worked all of this out under this related question on SO:It is not enough for a
UNIQUE
(orPRIMARY KEY
orEXCLUDE
) constraint to beDEFERRABLE
to make your presented code with multiple statements work.And you can not use
for this purpose. Per documentation:ALTER TABLE ... ALTER CONSTRAINT
Bold emphasis mine. Use instead:
Drop and add the constraint back in a single statement so there is no time window for anybody to sneak in offending rows. For big tables it would be tempting to conserve the underlying unique index somehow, because it is costly to delete and recreate it. Alas, that does not seem to be possible with standard tools (if you have a solution for that, please let us know!):
For a single statement making the constraint deferrable is enough:
A query with CTEs also is a single statement:
However, for your code with multiple statements you (additionally) need to actually defer the constraint - or define it as
INITIALLY DEFERRED
Either is typically more expensive than the above. But it may not be easily feasible to pack everything into one statement.Be aware of a limitation in connection with
FOREIGN KEY
constraints, though. Per documentation:So you cannot have both at the same time.