QNAP TS-809U: Domain Users / Groups disappear and the server has to be rejoined to the AD Domain

active-directorynasqnapsamba

We have a TS809U that we have joined to the domain. Shares and access rights works as the should with the domain users and everything is just the way it should be. But after a couple of weeks/a month the domain users and groups disappear from the TS809, and I have to manually rejoin the domain again. After rejoining the domain the process repeats itself within the same timeframe, and I have to rejoin the domain yet again.

There is no errors in the logs in the web interface, and it shows the NAS joining the domain succesfully. I updated the TS809U to the latest firmware 4.0.3 (from 3.x) in hopes that this would solve it, but the problem still persists.

Has anyone encountered this before and would what the issue could be, or how to troubleshoot it further?

The only message I've been able to find in the event viewer that references the NAS is a 5722 that might point in the direction of the comment below:

The session setup from the computer NASC473CD failed to authenticate. The name(s) of the account(s) referenced in the security database is NASC473CD$.
The following error occurred:
Access is denied.

The timing between when the entries disappeared and then re-appeared seems to be 14 days. Our domain is (still) based on Windows Server 2003.

Update

Update: The problem has surfaced again, but logs didn't really show anything interesting. wbinfo -t (testing the trust secret) did not work and (unsurprisingly) neither did wbinfo -c (changing the trust secret). I did discover that the current kerberos5 ticket store hadn't been refreshed and the validity of the kerberos tickets had expired, which might be connected. I've now added /sbin/update_krb5_ticket to the crontab to see if that'll help (and it's now being refreshed each hour).

Update 2014-02-25

Still no success. log.wb-DOMAINNAME shows that we're apparently being refused access, probably because of timed out credentials or invalid secrets. Not sure how to progress, as the kerberos ticket list (klist) showed a valid ticket when it occurred.

log.wb-DOMAINNAMEshows:

[2014/02/25 03:05:20.545176,  3] winbindd/winbindd_pam.c:1902(winbindd_dual_pam_auth_crap)
  could not open handle to NETLOGON pipe (error: NT_STATUS_ACCESS_DENIED)
[2014/02/25 03:05:20.545198,  2] winbindd/winbindd_pam.c:2003(winbindd_dual_pam_auth_crap)
  NTLM CRAP authentication for user [DOMAINNAME]\[MACHINE$] returned NT_STATUS_ACCESS_DENIED (PAM: 4)
[2014/02/25 03:05:20.548424,  3] winbindd/winbindd_pam.c:1841(winbindd_dual_pam_auth_crap)
  [20497]: pam auth crap domain: DOMAINNAME user: MACHINE$

(the same error messages occur when referring to users). At least the issue seems to be that the server responds with ACCESS_DENIED when samba tries to use the NETLOGON resource as far as I understand. I did however discover that one of the DNS servers on the TS809 was set to an external server – and not a server in the domain. I've updated the DNS-servers to both point to our AD DC-s to see if that could be the reason (if it falls over to the external, it will get host not found instead of timeouts for internal, domain based hosts).

Update 2015-03-04. Automated rejoin script deployed as a work around.

We're still no closer to determining a lasting solution, but we're currently seeing timeouts each week. This seems to be the same time as a valid kerberos ticket, but I've been unable to find any setting that changes it.

I have however created a small script that checks if we've lost the user list from the domain, and rejoins the server if needed. (Using Samba's net rpc join command.) "username" is a user in the domain that have access to join computers into the domain (we created a user for the qnap for this purpose only):

COUNT=`wbinfo -g | grep DOMAINNAME | wc -l`

if [ "$COUNT" -lt "1" ]
then
    /usr/local/samba/bin/net rpc join -Uusername%password
fi

This script is run on the qnap with cron (search for qnap cron on Google on how to set up cron properly). This has worked decent the last months.

Best Answer

Seems like an problem with the machine account password to me. By Design in a 2k3 Domain the reset is generated every 30 days, but the reset of the machine account password could be triggered by the client whenever you want.

Normally, the Member first creates the new password and then pulls this to the DC.

For whatever reason it seams like that your qnap is generating an new password after two weeks, but then is not able to push it to the DC cause of a broken secure channel.

I don't know the features offered by qnap, could you logon via ssh? I think it's an unix based system?! Maybe there's an option to disable machine account password. The trust won't stop working after this 30 days.

Maybe interesting: Link collection:

Related Question