Postgresql – Slony – Wait for Event

postgresqlreplicationslony

What is the behavior of Slony waiting for an event after lock set command?

Basically I am trying to do a switchover and it takes a lot of time. We are following these steps:

  1. Lock Set
  2. Wait for Event (timeout = 300)
  3. Move Set (timeout = 300)
  4. Wait for Event (timeout = 300)

The alt perl script we have with Slony source has the following sequence:

  1. Lock Set
  2. Sync
  3. Wait for Event
  4. Move Set

What should be the correct sequence of events and how can I make the switchover faster?

Best Answer

In the slony docs they have this third sequence as an example of a structured switchover [http://slony.info/documentation/failover.html#AEN839]:

lock set (id = 1, origin = 1);
move set (id = 1, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2, wait on=1);

Anyway, I've always used the alt-perl tools with the following, probably well-known, syntax. This moves set2 origin from node 1 to node 2, ie node 2 will become provider.

/usr/local/slony/bin/slonik_move_set set2 1 2 | /usr/local/pgsql/bin/slonik

As you state the script use this sequence of commands: lock set, sync, wait for event, move set. This has worked flawlessly.

What might take some time is the locking and sync. The locking might have to wait for any long running transactions to finish. If the sync is lagging all nodes has to be synchronized when doing a slonik_movet_set.

It seems reasonable to do a sync after the lock as way to confirm the sync between nodes. You could add the sync to your slonik script, ie:

lock set (id = 1, origin = 1);
sync (id = 1);
wait for event (origin = 1, confirmed = 2, wait on=1);    
move set (id = 1, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2, wait on=1);

Also notice id of the confirming node and that you can specify a timeout for the wait for event command, default is 600 sec [http://slony.info/documentation/stmtwaitevent.html]