DB2 – Fix LDAP Security Plugin Error

db2db2-luwopenldap

I'm playing around with the LDAP security plugin. I noticed that after a certain amount of grants I get an error that the ldap server can't be reached:

SQL30082N  Security processing failed with reason "26" ("UNEXPECTED SERVER 
ERROR").  SQLSTATE=08001

related entries from db2diag.log

2018-08-16-08.06.30.095156+120 E650341472E720        LEVEL: Severe
PID     : 87873                TID : 140137426052864 PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : WT000N11
APPHDL  : 0-1814               APPID: *LOCAL.db2inst1.180816060533
AUTHID  : DB2INST1             HOSTNAME: nya-50
EDUID   : 581                  EDUNAME: db2agent (NYA) 0
FUNCTION: DB2 UDB, bsu security, sqlex_get_authid_type, probe:300
MESSAGE : ADM13001E  Plug-in "IBMLDAPgroups" received error code "-16" from the 
          DB2 security plug-in API "db2secDoesGroupExist" with the error 
          message "InitLDAP: bind failed rc=81 (Can't contact LDAP server) 
          SearchDN='cn=admin,dc=its,dc=se'".

2018-08-16-08.06.30.095818+120 I650342193E845        LEVEL: Info
PID     : 87873                TID : 140137426052864 PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : WT000N11
APPHDL  : 0-1814               APPID: *LOCAL.db2inst1.180816060533
AUTHID  : DB2INST1             HOSTNAME: nya-50
EDUID   : 581                  EDUNAME: db2agent (NYA) 0
FUNCTION: DB2 UDB, SW- common services, sqlnn_cmpl, probe:670
MESSAGE : ZRC=0x805C0177=-2141453961=SQLEX_PLGN_SRV_CON_UNEXPECTED_ERROR
          "The server security plugin encountered an unexpected error"
DATA #1 : String, 51 bytes
An error was detected during statement compilation.
DATA #2 : String, 156 bytes
Compiler error stack for rc = -2141453961:
sqlnn_cmpl[300]
sqlnp_main[250]
sqlnp_parser[510]
sqlnp_smactn[100]
sqlnq_auth_stmt[110]
sqlnq_auth_stmt_end[10]

A simple sh script to reproduce the problem:

#!/bin/sh

n=0
db2 connect to db
e=$(db2 -x "with t(n) as ( values 1 union all select n+1 from t where n<300 ) select listagg('A' || n,', ') from t")

for t in $(db2 -x "select rtrim(tabschema)||'.'||rtrim(tabname) from syscat.tables where tabschema not like 'SYS%'"); do
        db2 -v "grant select on table $t to group $e"
        if [ $? -ne 0 ]; then
                exit 1
        fi
        n=$(expr $n + 1)
done
exit 0

There's a strong correlation between the number of groups in the grant and the number of iterations before the crash (the numbers is exactly the same every time I run), so I suspect that some resource gets exhausted.

grant select on table Ti to group A1, A2, ...,An

+----------------------------------------------------+
| Number of groups (n) | Iterations before crash (i) |
|----------------------------------------------------+
|                  100 |                         141 |
|                  200 |                          69 |
|                  300 |                          46 |
|                  600 |                          22 |
+----------------------------------------------------+

I can't find any errors in the LDAP-side.

I can repeatably run the test without restarting anything between, and the crash appears at the same iteration, every time.

If I configure Db2 to use two ldaps:

LDAP_HOST = host1 host2

it switches between ldaps when the crash appears, and the test-script continues.

* Edit: Additional observations *

Adding a sleep between each grant increases the number of iterations. For 600 groups:

+------------------------------------------+
| Sleep seconds | Iterations before crash  |
|------------------------------------------+
|             0 |                      22  |
|             1 |                      28  |
|             2 |                      31  |
|             3 |                   *1166* |
+------------------------------------------+

For the 3 second sleep, it ran all 1166 tables successfully. Unfortunately, such sleep won't be a very practical solution.

Any clues anyone?

* Edit: Additional observations 2 *

Running several test-scripts in parallel causes the situation to appear really fast. I suspect now that it is a problem with LDAP rather than with Db2. I do notice that Db2 keep a connection open for some time, so it may help to increase some limit in LDAP. Will check tomorrow.

Best Answer

I never did open an PMR since increasing the number of local ports available to ldap seemed to solve the problem:

sysctl -w net.ipv4.ip_local_port_range="30000 65535"

Also you could decrese timeout value for ports

sysctl -w net.ipv4.tcp_fin_timeout=30

Also using P. Vernons suggestion decreased both the time and the overall resource usage. I.e. instead of:

for t in ${tables}: do
    db2 "grant select on table $t to group g1,g2,...gn"
done

the following can be used:

db2 "create role ${ROLE}"
db2 "grant role ${ROLE} to group g1,g2,...gn" 
for t in ${tables}: do
    db2 "grant select on table $t to role ${ROLE}"
done
Related Question