Postgresql – Is anyone else experiencing consistency issues with postgres-xl

postgresql

Just like the title says. I have been playing with postgres-xl for over a month and have been seeing strange behaviors. I am using the docker container referenced from the postgres-xl.org site and have managed to build some working clusters, though most of them are broken from the beginning using the same docker images/config template. I can't even create databases from a fresh install. From one build to the next with only changing the ip addresses used in the cluster, I get mixed results. Most of the time this happens when trying to set up the slave nodes, and I have not yet once been able to get the GTM proxy going with any sort of HA nodes in place, but even with a plain deployment, i.e. just a GTM and a couple of coordinators and datanodes, I still get mixed results. Also, is there a forum for postgres-xl anywhere? I have been searching and can't find anything except their site and some documentation. Here are a few of the error messages I get:

PGXC Createdb asset_system
Selected COORD0.
createdb: could not connect to database template1: FATAL: GTM generated global XID not available
HINT: Check if GTM/GTM-proxy is running @ 172.17.1.223:20001 and reachable from this host. Your firewall could also block access to a host/port

(Yes the GTM is running on port 20001 and is not being blocked)

This during an init all:

CREATE NODE PGXL1 WITH (TYPE='datanode', HOST='172.17.1.226', PORT=21001);
CREATE NODE

SELECT pgxc_pool_reload();
WARNING: can not connect to GTM: Connection reset by peer
ERROR: GTM generated global XID not available
HINT: Check if GTM/GTM-proxy is running @ 172.17.1.223:20001 and reachable >from this host. Your firewall could also block access to a host/port

CREATE NODE COORD0 WITH (TYPE='coordinator', HOST='172.17.1.225', >PORT=5432);
CREATE NODE

ALTER NODE COORD1 WITH (HOST='172.17.1.226', PORT=5432);
ALTER NODE

CREATE NODE PGXL0 WITH (TYPE='datanode', HOST='172.17.1.225', PORT=21000);
CREATE NODE

CREATE NODE PGXL1 WITH (TYPE='datanode', HOST='172.17.1.226', PORT=21001, PREFERRED);
CREATE NODE
SELECT pgxc_pool_reload();
ERROR: GTM error, could not obtain snapshot. Current XID = 3030, Autovac = 0
Done.

EXECUTE DIRECT ON (PGXL0) 'CREATE NODE COORD0 WITH (TYPE=''coordinator'', HOST=''172.17.1.225'', PORT=5432)';
EXECUTE DIRECT

EXECUTE DIRECT ON (PGXL0) 'CREATE NODE COORD1 WITH (TYPE=''coordinator'', HOST=''172.17.1.226'', PORT=5432)';
WARNING: can not connect to GTM: Connection reset by peer

(Yeah, it first can execute, then it can't?)

Then on top of this, the recommended tool to use according to the postgres-xl docs is pgxc_ctl, which reports faulty results when running the monitor command. It will tell me the slaves(coordinators and/or datanodes and/or GTM slave, this changes too) are not running, but I can ssh to the machine and clearly see the process going and the ports open.

I like the idea of postgres-xl being able to scale out read and write, but if it doesn't work, it doesn't work. I am not sure if this is an issue with postgres-xl or maybe some limitation I am hitting with docker and getting some resource clashes. Any help and advice is much appreciated.

Docker version:

Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef

Postgres Version:

psql (Postgres-XL) 9.2.0
(based on PostgreSQL) 9.2.4 (Postgres-XL 9.2.0)

Host uname:

Linux ubuntu-14 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I know this is tagged as postgresl, which is not truly the case, however I don't have engough reputation points to create a new tag for postgres-xl.

EDIT: This question has nothing to do with helping me install or configure postgres-xl, I have done this several hundred times. This question is about stability, documentation availability and completeness, and the overall fitness of postgres-xl. for a production environment. I know how to install a database, I know how to configure a firewall.
@LotsOfData I appreciate that one of the original developers of postgres-xl, and the precursor postgres-xc, would take time to "answer" my question, it was no help whatsoever. I clearly stated that I had the port and IPs configured correctly, and that using the same template and only changing the IPs for different scale out attempts I get mixed results, the only total failure I have seen is being able to run a gtm-proxy at the same time there are slave nodes to either the coordinators or the datanodes or both. I have checked and rechecked the configs many times. I have read the sparse documentation on the project's website. I also take it you didn't read this question as your little pgxc_ctl tool is buggy at best. Running the monitor commands will return that a node is not running when you can ssh to the machine and see it, wait I am repeating myself now. PLEASE READ THE WHOLE QUESTION BEFORE "ANSWERING" IT. Better yet, go fix your code.

Best Answer

I have only seen the above message when I set it up and there is a firewall or incorrectly specified the GTM port.

I always recommend to first set up a simple local configuration, with 1 GTM, 1 Coordinator, and 2 Datanodes running on the same server. Once you get that working, try more complex architectures, it will be easier moving forward.

Also, please use the pgxc_ctl utility to configure, it will save you a lot of trouble and will check for port conflicts and such.

Good luck!