High Availability Installation

High Availability Installation

1.1. Description

The module ProvHA provides functionality for failover setups.
This includes the following services:

  • DHCP

  • TFTP

  • time

  • DNS

Typically the load is balanced – master and slave share the provisioning load. If one instance is down all work is done by the other machine keeping your network alive. Your customer doesn't notice the incident…

The configuration files needed for provisioning are updated as follows:

  • master: same as in standalone installation (every time a relevant database value has changed)

  • slave:

    • rebuild every n seconds (this can be configured via master GUI (Global config ⇒ ProvHA))

    • rebuild on changes of several database tables (cmts, ippool) 

1.1.1. DHCP

We use the failover functionality of ISC DHCP which supports a setup with one master (primary) and on slave (secondary) instance, called peers. Each server handles by default 50% of each IP pool, the load balance is configurable. Servers inform each other about leases – if one instance goes down the failover peer takes over the complete pools. If both servers are active again the pools will be balanced automatically.

Configuration is done in /etc/dhcp-nmsprime/failover.conf , the pools in /etc/dhcp-nmsprime/cmts_gws/*.conf  are configured with a failover statement.

1.1.2. TFTP

1.1.2.1. Theory

For TFTP we have to distinct between DOCSIS versions:

  • For DOCSIS versions less 3 one can only provide one TFTP server, realized via option next-server statement in global.conf. In our setup each of the two DHCP servers sets this to its own IP address.

  • For higher versions the value in next-server  can be overwritten using option vivso (125.2: CL_V4OPTION_TFTPSERVERS) – there can be multiple TFTP servers configured. For our system each DHCP server provides its IP address first and the peer's IP address as second.

1.1.2.2. Practice

Looks like – at least Cisco 7225 – is not able to make use of this information. Attached two tcpumps of the same DHCP ACK; one sent from slave to CMTS, the other sent from the CMTS to the modem: dhcp_ack_slave_to_cmts.txt dhcp_ack_slave_to_cmts.txt

And here a screenshot of the diff:

  • CMTS replaces the IP of “next server” and “tftp server” by it's own one

  • no matter in what order the IPs in option 125.2 are – every single TFTP request of a modem is directed to “Next server IP address“

  • that means:

    • if at an HA NMS the DHCP server is working all ACKs of this server contain it's own IP address

    • if the TFTP server is dead on this server or a configfile is missing the CM will be stuck in init(o) and rebooting endlessly



1.1.3. Time

option time-servers accept a comma separated list of IP addresses. Each DHCP server provides its IP address first and the peer's IP address as second.

1.1.4. DNS

option domain-name-servers accept a comma separated list of IP adresses. Each DHCP server provides its IP address first and the peer's IP address as second.

what about zone sections in global.conf? Shoud peer IP be given as secondary?



open question: needs DNS server configuration to be changed, too?



1.2. Preparation

You will need two NMSPrime installations to provide failover functionality.

1.2.1. Master

One is defined as MASTER and is comparable to an installation without failover (a “classical” NMSPrime). This instance is the only one with write access to the databases – all tasks changing the database are done here:

  • GUI actions (like adding/changing/deleting elements like contracts, modems, cmts, etc)

  • API

  • cacti polling

  • communication with external API like “envia TEL” (module ProvVoipEnvia)

1.2.2. Slave

The other installation (currently only one slave is supported due to restrictions of ISC DHCPd) has read-only access to the database:

  • at the slave the database can not be changed

  • provisioning of CM/MTA/CPE will be done completely by the slave if the master fails

1.2.3. Database

The database is set up as a galera cluster; this way all data is duplicated and stored e.g. at master and slave machine.

1.2.4. CMTSs

  CMTS configs need to be extended: For each of the two instances we need

  • cable helper-address statement (interface Bundle x)

  • ntp server  statement

1.3. Installation

Most of the work is done by our installation scripts deployed with the ProvHA module. Details are given for your understanding of the whole process and for later configuration changes.