I have skytools setup for postrgesql replication. It keeps on failing intermittently. When I checked londiste status I get following error
$ londiste /etc/skytools/mydb_data_0.ini status
Queue: mydb_data Local node: mydb_data_0
mydb_data (root)
| Tables: 18/0/0
| Lag: 9s, Tick: 2620740
+--: mydb_data_0 (leaf)
| Tables: 18/0/0
| Lag: 15m8s, Tick: 2620670
| ERR: mydb_data_0: Lost position: batch 620669..2620669, dst has 2620670
+--: mydb_data_1 (leaf)
Tables: 18/0/0
Lag: 9s, Tick: 2620740
I really don't understand what's going wrong. I get the same error message in postgres logs also,
Exception: Lost position: batch 2620669..2620669, dst has 2620670
I found this article to solve the error which I'm getting. It says you would have to use --reset
option of worker to reset the queue position on remote site and then issue wait-sync
to get the table queue moving again.
So I did this,
$ londiste /etc/skytools/mytestdb_data_0.ini worker --reset
Ignoring stale pidfile
2016-12-23 17:00:34,278 15245 INFO Resetting queue tracking on dst side
It resets the queue successfully, but when I check londiste status I get this error,
$ londiste /etc/skytools/mydb_data_0.ini status
Queue: mydb_data Local node: mydb_data_0
mydb_data (root)
| Tables: 18/0/0
| Lag: 9s, Tick: 2620740
+--: mydb_data_0 (leaf)
| Tables: 18/0/0
| Lag: 15m8s, Tick: 2620670
| ERR: mydb_data_0: [ev_id=84594950,ev_txid=702851528] duplicate key value violates unique constraint "dmn_pkey"
+--: mydb_data_1 (leaf)
Tables: 18/0/0
Lag: 9s, Tick: 2620740
I don't know what is causing this, can you please guide me on this.
Postgresql verion : 9.5, Skytools version : 3.2
Update
I found this skytools logs for master db,
2016-12-27 05:36:23,369 15563 ERROR Job mydb_data_0 crashed: Lost position: batch 2681655..2681655, dst has 2681656
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/skytools-3.0/skytools/scripting.py", line 578, in run_func_safely
r = func()
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/cascade/consumer.py", line 199, in work
return BaseConsumer.work(self)
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/baseconsumer.py", line 257, in work
self._launch_process_batch(db, batch_id, ev_list)
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/baseconsumer.py", line 286, in _launch_process_batch
self.process_batch(db, batch_id, list)
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/cascade/consumer.py", line 168, in process_batch
if self.is_batch_done(state, self.batch_info, dst_db):
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/cascade/worker.py", line 185, in is_batch_done
done = CascadedConsumer.is_batch_done(self, state, batch_info, dst_db)
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/cascade/consumer.py", line 254, in is_batch_done
prev_tick, cur_tick, dst_tick))
Exception: Lost position: batch 2681655..2681655, dst has 2681656
2016-12-27 05:37:11,988 18190 INFO Resetting queue tracking on dst side
2016-12-27 05:37:47,776 18578 INFO pgq.maint_operations is installed
2016-12-27 05:37:48,038 18578 INFO {count: 0, duration: 0.295}
2016-12-27 05:37:48,115 18578 ERROR Job mydb_data_0 got error on connection 'db': duplicate key value violates unique constraint "dmn_pkey"
DETAIL: Key (id)=(31780560) already exists.. Query: update only public.percent_bleed set percent_redir_sid = '43 ...
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/skytools-3.0/skytools/scripting.py", line 578, in run_func_safely
r = func()
File "/usr/lib/python2.7/dist-packages/skytools-3.0/pgq/cascade/consumer.py", line 199, in work
return BaseConsumer.work(self)
Best Answer
Sorry for the answer, but I cannot comment....
You got the pretty straightforward message :
It says you're trying to insert a value which is already present in the table. You have to work on your enqueue/dequeue process to avoid duplicates.