Changes in CTDB 2.4
===================

User-visible changes
--------------------

* A missing network interface now causes monitoring to fail and the
  node to become unhealthy.

* Changed ctdb command's default control timeout from 3s to 10s.

* debug-hung-script.sh now includes the output of "ctdb scriptstatus"
  to provide more information.

Important bug fixes
-------------------

* Starting CTDB daemon by running ctdbd directly should not remove
  existing unix socket unconditionally.

* ctdbd once again successfully kills client processes on releasing
  public IPs.  It was checking for them as tracked child processes
  and not finding them, so wasn't killing them.

* ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
  ctdbd (such as uses of ctdb in eventscripts) use the correct socket.

* Always use Jenkins hash when creating volatile databases.  There
  were a few places where TDBs would be attached with the wrong flags.

* Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
  which led to header corruption for empty records.  This resulted
  in inconsistent headers on two nodes and a request for such a record
  keeps bouncing between nodes indefinitely and logs "High hopcount"
  messages in the log. This also caused performance degradation.

* ctdbd was losing log messages at shutdown because they weren't being
  given time to flush.  ctdbd now sleeps for a second during shutdown
  to allow time to flush log messages.

* Improved socket handling introduced in CTDB 2.2 caused ctdbd to
  process a large number of packets available on single FD before
  polling other FDs.  Use fixed size queue buffers to allow fair
  scheduling across multiple FDs.

Important internal changes
--------------------------

* A node that fails to take/release multiple IPs will only incur a
  single banning credit.  This makes a brief failure less likely to
  cause node to be banned.

* ctdb killtcp has been changed to read connections from stdin and
  10.interface now uses this feature to improve the time taken to kill
  connections.

* Improvements to hot records statistics in ctdb dbstatistics.

* Recovery daemon now assembles up-to-date node flags information
  from remote nodes before checking if any flags are inconsistent and
  forcing a recovery.

* ctdbd no longer creates multiple lock sub-processes for the same
  key.  This reduces the number of lock sub-processes substantially.

* Changed the nfsd RPC check failure policy to failover quickly
  instead of trying to repair a node first by restarting NFS.  Such
  restarts would often hang if the cause of the RPC check failure was
  the cluster filesystem or storage.

* Logging improvements relating to high hopcounts and sticky records.

* Make sure lower level tdb messages are logged correctly.

* CTDB commands disable/enable/stop/continue are now resilient to
  individual control failures and retry in case of failures.


Changes in CTDB 2.3
===================

User-visible changes
--------------------

* 2 new configuration variables for 60.nfs eventscript:

  - CTDB_MONITOR_NFS_THREAD_COUNT
  - CTDB_NFS_DUMP_STUCK_THREADS

  See ctdb.sysconfig for details.

* Removed DeadlockTimeout tunable.  To enable debug of locking issues set

   CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh

* In overall statistics and database statistics, lock buckets have been
  updated to use following timings:

   < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s

* Initscript is now simplified with most CTDB-specific functionality
  split out to ctdbd_wrapper, which is used to start and stop ctdbd.

* Add systemd support.

* CTDB subprocesses are now given informative names to allow them to
  be easily distinguished when using programs like "top" or "perf".

Important bug fixes
-------------------

* ctdb tool should not exit from a retry loop if a control times out
  (e.g. under high load).  This simple fix will stop an exit from the
  retry loop on any error.

* When updating flags on all nodes, use the correct updated flags.  This
  should avoid wrong flag change messages in the logs.

* The recovery daemon will not ban other nodes if the current node
  is banned.

* ctdb dbstatistics command now correctly outputs database statistics.

* Fixed a panic with overlapping shutdowns (regression in 2.2).

* Fixed 60.ganesha "monitor" event (regression in 2.2).

* Fixed a buffer overflow in the "reloadips" implementation.

* Fixed segmentation faults in ping_pong (called with incorrect
  argument) and test binaries (called when ctdbd not running).

Important internal changes
--------------------------

* The recovery daemon on stopped or banned node will stop participating in any
  cluster activity.

* Improve cluster wide database traverse by sending the records directly from
  traverse child process to requesting node.

* TDB checking and dropping of all IPs moved from initscript to "init"
  event in 00.ctdb.

* To avoid "rogue IPs" the release IP callback now fails if the
  released IP is still present on an interface.


Changes in CTDB 2.2
===================

User-visible changes
--------------------

* The "stopped" event has been removed.

  The "ipreallocated" event is now run when a node is stopped.  Use
  this instead of "stopped".

* New --pidfile option for ctdbd, used by initscript

* The 60.nfs eventscript now uses configuration files in
  /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
  hardcoding them into the script.

* Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.

* The NoIPTakeoverOnDisabled tunable has been renamed to
  NoIPHostOnAllDisabled and now works properly when set on individual
  nodes.

* New ctdb subcommand "runstate" prints the current internal runstate.
  Runstates are used for serialising startup.

Important bug fixes
-------------------

* The Unix domain socket is now set to non-blocking after the
  connection succeeds.  This avoids connections failing with EAGAIN
  and not being retried.

* Fetching from the log ringbuffer now succeeds if the buffer is full.

* Fix a severe recovery bug that can lead to data corruption for SMB clients.

* The statd-callout script now runs as root via sudo.

* "ctdb delip" no longer fails if it is unable to move the IP.

* A race in the ctdb tool's ipreallocate code was fixed.  This fixes
  potential bugs in the "disable", "enable", "stop", "continue",
  "ban", "unban", "ipreallocate" and "sync" commands.

* The monitor cancellation code could sometimes hang indefinitely.
  This could cause "ctdb stop" and "ctdb shutdown" to fail.

Important internal changes
--------------------------

* The socket I/O handling has been optimised to improve performance.

* IPs will not be assigned to nodes during CTDB initialisation.  They
  will only be assigned to nodes that are in the "running" runstate.

* Improved database locking code.  One improvement is to use a
  standalone locking helper executable - the avoids creating many
  forked copies of ctdbd and potentially running a node out of memory.

* New control CTDB_CONTROL_IPREALLOCATED is now used to generate
  "ipreallocated" events.

* Message handlers are now indexed, providing a significant
  performance improvement.
