PostgreSQL SELECTs not returning correct result following recovery

backupindexpostgresqlrecoveryreplication

I took a copy of the current database from a live slave using pg_basebackup and recovered it to a brand new isolated computer following the instructions in the docs. Once the new database had started up, I could see everything was present and had been restored OK, but some simple SELECT queries were failing (returning 0 results), specifically:

SELECT * FROM users WHERE email_address='me@example.com';

However, some email addresses did work. The fix for it was a REINDEX on the email address index. Now, the manual says in the caveats section that hash indexes aren't transferred across in the WAL logs, so any slaves won't get the updates and also following recovery, a REINDEX should be done, but my index is a B-Tree and nothing is mentioned about that.

  1. What is happening?
  2. I don't see this on the live slave (connected to master in streaming replication) at the minute, will I in the future?

Systems:

Live M/S: CentOS 6.6, with PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit

Isolated Machine: OS X 10.1, PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit


CentOS:

production=# explain analyse verbose select user_id from hive.users where email_address='me@example.com';

QUERY PLAN                                                                
----------------------------------------------
 Index Scan using users_email_address_key on hive.users  (cost=0.14..8.16 rows=1 width=4) (actual time=0.044..0.044 rows=1 loops=1)
   Output: user_id
   Index Cond: (users.email_address = 'me@example.com'::text)
 Total runtime: 0.093 ms
(4 rows)

production=# select user_id from hive.users where email_address='me@example.com';
 user_id 
---------
       1
(1 row)

Mac OS X:

production=> explain analyse verbose select user_id from hive.users where email_address='me@example.com';

QUERY PLAN                                                                
----------------------------------------------
 Index Scan using users_email_address_key on hive.users  (cost=0.14..8.16 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
   Output: user_id
   Index Cond: (users.email_address = 'me@example.com'::text)
 Total runtime: 0.031 ms
(4 rows)

production=> select user_id from hive.users where email_address='me@example.com';
 user_id 
---------
(0 rows)

following the re-index:

production=# reindex index hive.users_email_address_key;
production=# explain analyse verbose select user_id from hive.users where email_address='me@example.com';
QUERY PLAN                                                
------------------
 Seq Scan on hive.users  (cost=0.00..1.41 rows=1 width=4) (actual time=0.019..0.019 rows=0 loops=1)
   Output: user_id
   Filter: (users.email_address = 'me@example.com'::text)
   Rows Removed by Filter: 33
 Total runtime: 0.045 ms
(5 rows)

production=# select user_id from hive.users where email_address='me@example.com';
 user_id 
---------
       1
(1 row)

Best Answer

The index data order on disk for text columns depends on the locales provided by the underlying operating system.

The same locales (that is, with the same name) may differ between operating systems on the order rules, even on simple things. As an example this question: PostgreSQL 9.1 streaming replication problem: replica fails to use an index properly shows how "0102"and 0102sort differently on Ubuntu and FreeBSD.

The result in the physical index structure is that one value should come before the other in one OS and vice-versa in the other.

The problem with a recovered base backup is the same as with replication: the on-disk structure of the index is used as-is, resulting in a corrupted index in the target system if the column's locale doesn't sort exactly the same between systems.

It's mentioned in the comments that the origin system is CentOS and destination Mac OS X. Given that CentOS uses GNU libc and Mac OS X certainly a BSD-like libc, I'd think that there is no chance at all for on-disk postgres portability between these systems.