Shinken makes it easy to have a high availability architecture. Just as easily as the load balancing feature at distributed shinken
Shinken is business friendly when it comes to meeting availability requirements.
You learned how to add new poller satellites in the distributed shinken. For the HA the process is the same You just need to add new satellites in the same way, then define them as “spares”.
You can (should) do the same for all the satellites for a complete HA architecture.
We keep the load balancing of the previous installation and we add a new server (if you do not need load balancing, just take the previous server). This new HA server will be server3 (server2 was for poller load balancing).
So like the previous case, you need to install the daemons but not launch them for now. Look at the 10 min start tutorial to know how to install them on server3.
Daemons on the server1 need to know where their spares are. Everything is done in the /etc/shinken directory. Each daemon has its own directory into /etc/shinken
Add theses lines regarding the daemon (ex schedulers/scheduler-spare.cfg):
define scheduler{
scheduler_name scheduler-spare
address server3
port 7768
spare 1
}
define poller{
poller_name poller-spare
address server3
port 7771
spare 1
}
define reactionner{
reactionner_name reactionner-spare
address server3
port 7769
spare 1
}
define receiver{
receiver_name receiver-spare
address server3
port 7773
spare 1
}
define broker{
broker_name broker-spare
address server3
port 7772
spare 1
modules Simple-log,Livestatus
}
define arbiter{
arbiter_name arbiter-spare
address server3
host_name server3
port 7770
spare 1
}
Ok. Configuring HA is defining new daemons on server3 as “spare 1”.
Important
It’s very important that the two arbiter daemons have the same /etc/shinken directory. The whole configuration should also be rsync’ed or copied once a day to ensure the spare arbiter can take over in case of a massive failure of active arbiter.
So copy it in the server3 (overwrite the old one) in the same place.
You do not need to sync all configuration files for hosts and services in the spare. When the master starts, it will synchronize with the spare. But beware, if server1 dies and you must start from fresh on server3, you will not have the full configuration! So synchronize the whole configuration once a day using rsync or other similar method, it is a requirement.
Ok, everything is ready. All you need now is to start all the daemons:
$server1: sudo /etc/init.d/shinken start
$server3: sudo /etc/init.d/shinken start
If an active daemon die, the spare will take over. This is detected in a minute or 2 (you can change it in the daemons/deamon-spare.cfg, for each daemon).
Note
For stateful fail-over of a scheduler, link one of the distributed retention modules such as memcache or redis to your schedulers. This will avoid losing the current state of the checks handled by a failed scheduler. Without a retention module, the spare scheduler taking over will need to reschedule all checks and check states will be PENDING until this has completed.
Note
You now have a high availability architecture.