Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

review and discussion wished!

Overview

First of all we split between critical- and non-critical services. Everything that has to do with customer is "critical". All that has to do with service staff is "non-critical" at the time.

Critical ServicesNon Critical Services


All GUI related stuff

  • NMS PRIME GUI
  • Apache
  • Monitoring (Cacti)
  • Icinga / Nagios
Failover PossibleNo Failover considerations at the time

Failover Layers

1. NMS Prime GUI

no failover at the time

2. Apache

no failover at the time

3. Database

Normal MySQL / MariaDB failover cluster with N nodes. Possible Solutions:

  1. MaxScale
  2. MariaDB Galera Cluster (not tested)

4. NMS Prime Lower Layers

We differ between the master and N x slave NMS PRIME instances. The primary instance is also running the NMS PRIME GUI. Any changes in GUI will trigger realtime changes in the master config(s), like DHCP and TFTP. This is done via Laravel Observers or Jobs (e.g. Modem Observer).

The slaves are running on separate machines without a GUI. They are rebuilding DHCP, BIND, and TFTP configfiles on a regular base (e.g. 1 hour) e.g. via cronjob. The slaves are independent from Master and they are only connected towards MariaDB SQL cluster via a SQL read-only connection. So any changes in Master will be directly distributed towards SQL cluster and later automatically fetched from the slaves.

This concept offers:

  1. a Master with real-time changes towards all critical configs
  2. redundant slaves who is independent off Master
  3. a redundant database with load-sharing possibility
  4. Load-Sharing for either DHCP, DNS and TFTP for all Modems

5. Critical Services

ISC-DHCP

Normal ISC-DHCP failover with Master-Slave Concept:

Slaves rebuild their DHCP configs by them self after a defined time (see above).

DNS / BIND

Slaves rebuild their configs by them self after a defined time (see above).

More research required, but a good starting point could be here:

TFTP

Cronjob at slave will rebuild all configfiles on a recurring basis (e.g. every hour). In NMS Prime this could simply be done by running a artisan command. See below.


Possible Cronjob(s) for Slaves

Code Block
titlee.g. possible cronjob
php artisan nms:dhcp && systemctl restart dhcp

Code Block
titlee.g. possible cronjob
php artisan nms:configfile


Github TODO: #687

Info

implementing this into Laravel scheduling framework (for slaves only!) will be a advance especially if building all config files could take longer than rebuild loop, since this could be easy avoided using ->withoutOverlapping():

See: https://github.com/nmsprime/nmsprime/blob/dev/app/Console/Kernel.php#L35

I would love to see a /etc/nmsprime/env statement for a possible slave configuration, like

SLAVE_CONFIG_REBUILD_INTERVALL=3600 # time in seconds


Now there is a collective ticket: #771


Workflow

Drawio
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNamefailover
simpleViewerfalse
width400
diagramWidth726
revision1



Considerations on Failover from 22.5.2019

Ole Ernst

Torsten Schmidt

(Christian Schramm )