Db2 – ibm_db on HADR primary database

db2db2-10.5db2-luwpython

Yesterday I discovered a peculiar thing, it looks like a bug but I'll post it here in case someone seen it before. Testscript:

#!/usr/bin/python

import sys
import getopt
import re
import os
import traceback

import ibm_db

"""
Test of ibm_db in a HADR env
"""

def f1(dbname, username, pwd):
    print "start"
    cfg = (dbname, username, pwd)
    conn_str = "DATABASE=%s;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=%s;PWD=%s" % cfg
    conn1 = ibm_db.connect(conn_str, "", "")
    print "connect done"
    ibm_db.close(conn1)
    conn1 = None
    print "end"


def main():
    dbname = None
    username = None
    pwd = None
    try:
        opts, args = getopt.getopt(sys.argv[1:], "d:u:p:")
    except getopt.GetoptError:
        usage()
    for o, a in opts:
            if (o == "-d"):
                dbname = a
            if (o == "-u"):
                username = a
            if (o == "-p"):
                pwd = a

    f1(dbname, username, pwd)
    f1(dbname, username, pwd)

if __name__ == "__main__":
    try:
        main()
        print('')
    except Exception, e:
        print e
        sys.exit(1)
    sys.exit(0)

Running the script on a standalone database works as expected:

./test_hadr.py -d db1 -u user -p pwd
start
connect done
end
start
connect done
end

Running the script against a HADR PRIMARY hangs on second connection

./test_hadr.py -d db1 -u user -p pwd
start
connect done
end
start
# waits for a couple of minutes

I straced it and it appears to connect to failover on the second attempt, and after a couple of minutes:

write(1, "[IBM][CLI Driver] SQL1776N  The "..., 111[IBM][CLI Driver] SQL1776N  The command cannot be issued on an HADR database. Reason code = "1". SQLCODE=-1776
) = 111

Any clues why this happens, and how it can be avoided? Using the same connection several times appears to be working, but using two different connection hangs.

Best Answer

Looks like, when the alternate server is configured, the CLI driver prefers its proper hostname over "localhost". If you specify the machine hostname or IP address in the connection string instead of "localhost" the script works as expected.

The test script can be modified as:

[...]
import traceback
import socket
[...]

def f1(dbname, username, pwd):
    print "start"
    ip = socket.gethostbyname(socket.gethostname())
    cfg = (dbname, ip, username, pwd)
    conn_str = "DATABASE=%s;HOSTNAME=%s;PORT=50000;PROTOCOL=TCPIP;UID=%s;PWD=%s" % cfg
    [...]

It is recommended to explicitly bind Db2 to an IP address via nicbinding.cfg, in which case "localhost" wouldn't work anyway.