Ubuntu – the way to submit a patch to fix all the damage that LP: #600941 causes

developmentpatch

What is the best way to submit a patch to fix all the damage that LP: #600941 causes?

I ask because LP: #600941 was put into every version of Ubuntu still supported at this time. Should I pick a particular version and run ubuntu-bug on it? Should that version be the LTS or Oneiric or Precise (how can I get Precise if I need it?)

The story is that after it was pushed out all of our systems started experiencing Nagios nrpe restart failures.

Commands like /etc/init.d/nagios-nrpe-server restart

would cause nrpe to stop but not restart.

I tracked this down to the way that the /etc/init.d/nagios-nrpe-server script is calling start-stop-daemon.

The issue is that the "stop" stanza in the /etc/init.d/nagios-nrpe-server script first calls start-stop-daemon which sends SIGTERM to nrpe and then waits only for one second.

If nrpe has not exited by that time the pid file will still exist and the /etc/init.d/nagios-nrpe-server script will remove it.

Worse if /etc/init.d/nagios-nrpe-server restart is used not only will the pid file be removed, the attempt to restart nrpe will fail provided that the nrpe daemon is still tardy in shutting down.

The attempt to start under those circumstances will fail because nrpe will still be bound to a socket and the second attempt at binding will cause the nrpe startup to abort.

They should have wondered why there was a comment about "sometimes the pid file does not get removed".

They should have tested on systems that have a heavy load and therefore slow nrpe response times.

The fix is to add --retry 10 or such to the invocation of start-stop-daemon ... --stop ...

Thanks

Best Answer

First thanks for all the bug work you've done up until now. Its great that you'd like to get involved with fixing this bug!

The best way is to report a new bug against precise, and make it clear that it is a regression caused by LP:#600941. Give it the tag 'regression-updates'. It would also be good to mention it in the comments of LP:#600941, so that users will see that when they are investigating hitting the regression themselves. The regression-updates tag will ensure that your bug is triaged and responded to quickly. So yes, first start with this:

ubuntu-bug nagios-nrpe-server

Since it affects all releases, it doesn't matter where you do this (better that you do it on a platform you can leave alone so you can verify fixes).

Right now precise ISO's probably aren't installable, but you can try them here:

http://cdimage.ubuntu.com/daily/current/

You can also take an oneiric machine onto precise by editing the sources in /etc/apt/sources.list* and changing oneiric to precise, then doing apt-get update && apt-get dist-upgrade. There are transitions and big changes going on though, so don't do this on a production system!

To submit the fix, the best way is to use Ubuntu Distributed Development. Assign the bug to yourself, and then use these steps:

bzr branch lp:ubuntu/nagios-nrpe
cd nagios-nrpe
<edit files that need editing>
dch -D precise -i 'Fixing regression caused by bug 600941. (LP: #XXXXXX)'
debcommit
bzr push lp:~nutznboltz/ubuntu/precise/nagios-nrpe/fix-lpXXXXXX
bzr lp-propose

XXXXXX is your new bug #

You can find more about how to do this at https://wiki.ubuntu.com/DistributedDevelopment

Please don't hesitate to come ask in #ubuntu-devel and/or #ubuntu-server on Freenode as well.