Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Info |
---|
review and discussion wished! |
Overview
First of all we split between critical- and non-critical services:. Everything that has to do with customer is "critical". All that has to do with service staff is "non-critical" at the time.
Critical Services | Non Critical Services |
---|---|
| All GUI related stuff
|
Failover Possible | No Failover considerations at the time |
Failover Layers
1. NMS Prime GUI
no failover at the time
2. Apache
no failover at the time
3. Database
Normal MySQL / MariaDB failover cluster with N nodes. Possible Solutions:
- MaxScale
- MariaDB Galera Cluster (not tested)
4. NMS Prime Lower Layers
We differ between the master and N x slave NMS PRIME instances. The primary instance is also running the NMS PRIME GUI. Any changes in GUI will trigger realtime changes in any the master config(s), like DHCP , and TFTP, . .. This is done via Laravel Observers or Jobs (e.g. Modem Observer).
The slaves are running on separate machines without a GUI. They are rebuilding DHCP, BIND, and TFTP configfiles on a regular base (e.g. 1 hour) e.g. via cronjob. The slaves are independent from Master and they are only connected towards MariaDB SQL cluster via a SQL read-only connection. So any changes in Master will be directly distributed towards SQL cluster and later automatically fetched from the slaves.
This concept offers:
- a Master with real-time changes towards all critical configs
- redundant slaves who is independent off Master
- a redundant database with load-sharing possibility
- Load-Sharing for either DHCP, DNS and TFTP for all Modems
5. Critical Services
ISC-DHCP
Normal ISC-DHCP failover with Master-Slave Concept:
Slaves rebuild their DHCP configs by them self after a defined time (see above).
DNS / BIND
Slaves rebuild their configs by them self after a defined time (see above).
More research required, but a good starting point could be here:
- https://www.lisenet.com/2018/configure-bind-dns-servers-with-failover-and-dynamic-updates-on-centos-7/
- https://serverfault.com/questions/236096/isc-dhcpbind-with-failover-and-dynamic-updates-can-the-secondary-bind-update-d
TFTP
Cronjob at slave will rebuild all configfiles on a recurring basis (e.g. every hour). In NMS Prime this could simply be done by running
Possible Cronjob(s) for Slaves
Code Block | ||
---|---|---|
| ||
php artisan nms:dhcp && systemctl restart dhcp |
Code Block | ||
---|---|---|
| ||
php artisan nms:configfile |
Github TODO: #687
Info |
---|
implementing this into Laravel scheduling framework (for slaves only!) will be a advance especially if building all config files could take longer than rebuild loop, since this could be easy avoided using ->withoutOverlapping(): See: https://github.com/nmsprime/nmsprime/blob/dev/app/Console/Kernel.php#L35 I would love to see a /etc/nmsprime/env statement for a possible slave configuration, like SLAVE_CONFIG_REBUILD_INTERVALL=3600 # time in seconds |
Workflow
Drawio | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Considerations on Failover from 22.5.2019