Postgresql – Duplicate key value violates a unique constraint

postgresql

I've been looking at other posts. It seems many people have had this problem, but I'm somewhat new to Postgres and tried other solutions with no luck.

I have ny_stations table:

CREATE TABLE ny_stations (
id integer primary key,
name text,
latitude numeric,
longitude numeric,
nyct2010_gid integer,
boroname text,
ntacode text,
ntaname text
);

I am trying to insert values from another table, ny_raw_trips:

INSERT into ny_stations (id, name, latitude, longitude)
SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations);

Getting the error:

ERROR: duplicate key value violates unique constraint "ny_stations_pkey"
DETAIL: Key (id)=(151) already exists.

What am I doing wrong? Also,

SELECT pg_get_serial_sequence('ny_stations', 'id');

…returns nothing. Please let me know if any other info is required.

Best Answer

The id field in your ny_stations table does not seem to be defined as a serial, so it is expected that pg_get_serial_sequence will return nothing.

The duplicate you get relates to one of the records in your SELECT DISTINCT ... FROM ny_raw_trips ... is returning two rows with the same id:

SELECT start_station_id, COUNT(*) FROM (
  SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
  FROM ny_raw_trips
  WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
  ) a
GROUP BY start_station_id
HAVING COUNT(*) > 1;

You could list the rows that are introducing the duplication like this:

WITH src AS (
  SELECT DISTINCT start_station_id, start_station_name, start_station_latitude, start_station_longitude
  FROM ny_raw_trips
  WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
  )
SELECT *
FROM src
WHERE start_station_id IN (SELECT start_station_id FROM src GROUP BY start_station_id HAVING COUNT(*) > 1)
ORDER BY start_station_id;

Edit

Once you find the offending duplicates, should you consider that the first occurrence of each case is good enough (e.g. trivial differences in description or coordinate fields), you can use DISTINCT ON:

INSERT into ny_stations (id, name, latitude, longitude)
SELECT DISTINCT ON (start_station_id) start_station_id, start_station_name, start_station_latitude, start_station_longitude
FROM ny_raw_trips
WHERE start_station_id NOT IN (SELECT id FROM ny_stations)
ORDER BY start_station_id;

Related Solutions

Postgresql – Violates foreign key constraint

Basically foreign key constraints are not inherited. If you are working with table inheritance you have a few options.

Stop enforcing foreign keys
Use constraint triggers to enforce foreign keys

In most cases you are better off with a single large table and smaller join tables possibly with deferred foreign keys. Unfortunately these are not possible either to fully do the things that table inheritance can do so typically there is going to be at least some custom coding.

PostgreSQL – insert/update violates foreign key constraints

There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)

But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3 in your second attempt has syntax and logical errors, but the basic idea is:

CREATE TABLE table3 (   
  "ID" bigint NOT NULL DEFAULT '0',   
  "DataID" bigint DEFAULT NULL,   
  "Address" numeric(20) DEFAULT NULL,   
  "Data" bigint DEFAULT NULL,
   PRIMARY KEY ("ID"),   
   FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,   
   FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);

This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL. If a new row is inserted, check DataID and Address: if they contain a non-NULL value (say 27856), then check Table1 for DataID˙and Table2 for Address. If there is no such value in those tables, then return an error." This last point which you've seen first:

ERROR: insert or update on table "Table3" violates foreign key constraint 
    "Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".

So simple: if there is no row in Table1 where DataID = 27856, then you can't insert that row into Table3.

If you need that row, you should first insert a row into Table1 with DataID = 27856, and only then try to insert into Table3. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.

And now about the other problems.

You define your PKs as

CREATE all_your_tables (
    first_column NOT NULL DEFAULT '0',   
    [...]
    PRIMARY KEY ("ID"),

A primary key means that all the items in it are different from each other, that is, the values are UNIQUE. If you give a static DEFAULT (like '0') to a UNIQUE column, you will experience bad surprises all the time. This is what you got in your third error message.

Furthermore, '0' means a text string, but not a number (bigint or numeric in your case). Use simply 0 instead (or don't use it at all, as I written above).

And a last point (I may be wrong here): in Table2, your Address field is set to numeric(20). At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3 like

ID        DataID           Address             Data
123       3216547          1234567890454       654897564134569

Now that address happens to change to something other. How do you make your child row in Table3 follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.

Best Answer

Edit

Related Solutions

Postgresql – Violates foreign key constraint

PostgreSQL – insert/update violates foreign key constraints

Related Question