PostgreSQL: Corrupt primary key, inconsistent table

postgresqlprimary-key

While recovering from a cloud failure, I found that some tables on a PostgreSQL database are behaving strangely. These tables are indexed using a primary key, but a pg_dump yielded duplicate fields, failing a pg_restore on a backup server.

I have tried to REINDEX:

REINDEX INDEX rank_details_pkey;
ERROR:  could not create unique index "rank_details_pkey"
DETAIL:  Table contains duplicated values.

The index is defined as:

<table info here>
Indexes:
    "rank_details_pkey" PRIMARY KEY, btree (user_id)

And, oddly,

SELECT user_id, COUNT(*) FROM <table name> GROUP BY 1 HAVING COUNT(*) > 1;
 user_id | count 
---------+-------
(0 rows)

To conclude – I have duplicate values in my table which can not be found or cleared.

Any ideas how to fix this? This is a production server, so all fixes should be done without affecting service.

Best Answer

There are various ways this can happen in Oracle - I'm not sure about postgres, but I think I would call this an "integrity violation" rather than "corruption"

Perhaps you can do one of the things suggested here, ie set enable_indexscan = off or

begin;
drop index rank_details_pkey;
select user_id, count(*) from rank_details group by user_id having count(*) > 1;
rollback;

But "there are likely some locking issues with this, so be careful with it in production"

The idea is to force the query to scan the table rather than just the index (which does not have the duplicates). You may also, and more simply, be able to acheive the same by:

select user_id, f(<some other column>), count(*)
from rank_details
group by user_id, f(<some other column>)
having count(*) > 1

where f() returns a constant, which may trick the planner into a table scan.