Postgresql – Duplicate key value violates a unique constraint

postgresql

I've been looking at other posts. It seems many people have had this problem, but I'm somewhat new to Postgres and tried other solutions with no luck.

I have ny_stations table:

CREATE TABLE ny_stations (
id integer primary key,
name text,
latitude numeric,
longitude numeric,
nyct2010_gid integer,
boroname text,
ntacode text,
ntaname text
);

I am trying to insert values from another table, ny_raw_trips:

INSERT into ny_stations (id, name, latitude, longitude)
SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations);

Getting the error:

ERROR: duplicate key value violates unique constraint "ny_stations_pkey"
DETAIL: Key (id)=(151) already exists.

What am I doing wrong? Also,

SELECT pg_get_serial_sequence('ny_stations', 'id');

…returns nothing. Please let me know if any other info is required.

Best Answer

The id field in your ny_stations table does not seem to be defined as a serial, so it is expected that pg_get_serial_sequence will return nothing.

The duplicate you get relates to one of the records in your SELECT DISTINCT ... FROM ny_raw_trips ... is returning two rows with the same id:

SELECT start_station_id, COUNT(*) FROM (
  SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
  FROM ny_raw_trips
  WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
  ) a
GROUP BY start_station_id
HAVING COUNT(*) > 1;

You could list the rows that are introducing the duplication like this:

WITH src AS (
  SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
  FROM ny_raw_trips
  WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
  )
SELECT *
FROM src
WHERE start_station_id IN (SELECT start_station_id FROM src GROUP BY start_station_id HAVING COUNT(*) > 1)
ORDER BY start_station_id;


Edit

Once you find the offending duplicates, should you consider that the first occurrence of each case is good enough (e.g. trivial differences in description or coordinate fields), you can use DISTINCT ON:

INSERT into ny_stations (id, name, latitude, longitude)
SELECT DISTINCT ON (start_station_id) start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
ORDER BY start_station_id;