Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

review and discussion wished!

Overview

First of all we split between critical- and non-critical services:. Everything that has to do with customer is "critical". All that has to do with service staff is "non-critical" at the time.

Critical ServicesNon Critical Services


All GUI related stuff

  • NMS PRIME GUI
  • Apache
  • Monitoring (Cacti)
  • Icinga / Nagios
Failover PossibleNo Failover considerations at the time

Failover Layers

1. NMS Prime GUI

no failover at the time

2. Apache

no failover at the time

3. Database

Normal MySQL / MariaDB failover cluster with N nodes. Possible Solutions:

  1. MaxScale
  2. MariaDB Galera Cluster (not tested)

4. NMS Prime Lower Layers

We differ between the master and N x slave NMS PRIME instances. The primary instance is also running the NMS PRIME GUI. Any changes in GUI will trigger realtime changes in any the master config(s), like DHCP , and TFTP, . .. This is done via Laravel Observers or Jobs (e.g. Modem Observer).

The slaves are running on separate machines without a GUI. They are rebuilding DHCP, BIND, and TFTP configfiles on a regular base (e.g. 1 hour) e.g. via cronjob. The slaves are independent from Master and they are only connected towards MariaDB SQL cluster via a SQL read-only connection. So any changes in Master will be directly distributed towards SQL cluster and later automatically fetched from the slaves.

This concept offers:

  1. a Master with real-time changes towards all critical configs
  2. redundant slaves who is independent off Master
  3. a redundant database with load-sharing possibility
  4. Load-Sharing for either DHCP, DNS and TFTP for all Modems

5. Critical Services

ISC-DHCP

Normal ISC-DHCP failover with Master-Slave Concept:

Slaves rebuild their DHCP configs by them self after a defined time (see above).

DNS / BIND

Slaves rebuild their configs by them self after a defined time (see above).

More research required, but a good starting point could be here:

TFTP

Cronjob at slave will rebuild all configfiles on a recurring basis (e.g. every hour). In NMS Prime this could simply be done by running


Possible Cronjob(s) for Slaves

Code Block
titlee.g. possible cronjob
php artisan nms:dhcp && systemctl restart dhcp

Code Block
titlee.g. possible cronjob
php artisan nms:configfile


Github TODO: #687

Info

implementing this into Laravel scheduling framework (for slaves only!) will be a advance especially if building all config files could take longer than rebuild loop, since this could be easy avoided using ->withoutOverlapping():

See: https://github.com/nmsprime/nmsprime/blob/dev/app/Console/Kernel.php#L35

I would love to see a /etc/nmsprime/env statement for a possible slave configuration, like

SLAVE_CONFIG_REBUILD_INTERVALL=3600 # time in seconds



Workflow

Drawio
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNamefailover
simpleViewerfalse
width400
diagramWidth726
revision1



Considerations on Failover from 22.5.2019

Ole Ernst

Torsten Schmidt

(Christian Schramm )