[Freeswitch-users] High Availability Cluster Module for FreeSWITCH

Mon Feb 11 04:07:14 MSK 2013

On Sun, Feb 10, 2013 at 4:32 PM, Steven Ayre <steveayre at gmail.com> wrote:

> One is the mod_ha_cluster is an N+1 Cluster, not a active/passive pair.
>> That way if any single node fails, there's something to pick up the slack.
>
>
> Corosync clusters are not limited to active/passive pairs. It's just a
> very common setup.
>
> For example you could have resource agents 1) to keep FS running on all
> nodes 2) for virtual IPs 3) for IP:port Sofia profiles. You can then define
> dependancies between them. That should let you keep FS running at all times
> and move an IP and the associated Sofia profiles to a new node that's
> already running FS when the original node fails. For maintance you can
> simply trigger that from the CRM.

This is true, however it would require a very complicated resource agent to
manage FreeSWITCH in a similar configuration to what mod_ha_cluster is
designed to do. In addition, the functionality simply does not exist in
FreeSWITCH right now to tell it to take over for a specific failed master
and recover those specific calls. So, right now, using Pacemaker and
Corosync, there is absolutely no way to run a N + x FreeSWITCH cluster.
Also, the response time on a Pacemaker + Corosync cluster for failure
detection and recovery is measured in seconds, which is not ideal for a
real-time communications platform. Obviously, there is nothing at all
preventing you from running Pacemaker and Corosync in addition to
mod_ha_cluster. In fact, I was even considering providing some CLI
arguments to allow FreeSWITCH (with mod_ha_cluster enabled) to be commanded
from Pacemaker and act as its own resource agent. If you think that would
be an interesting feature, I can look into what it would take to work that
out. I previously wrote a resource agent for Broadvox when I worked for
them, and I wrote another one after leaving them. Both were intended to
manage FreeSWITCH as a master/slave pair, so I have some idea on how to do
it, just not necessarily the specifics on making it do N + x the way I
intend to have mod_ha_cluster operate.

> Secondly, in order to recover live calls, you need a list of the calls.
>> That currently requires some sort of odbc (or postgres) with replication.
>> Again, that's abstracted as part of mod_ha_cluster.
>> Third: The docs mention a similar of pooling for registration, that you
>> can register to one server and you're regged on them all without needing a
>> DB to sync everything.
>
>
> Which can also be done using Corosync's IPC messaging API.
>
> (Personally I prefer using MySQL Cluster via ODBC - which is
> HA, synchronous and offloads all load off of the FS nodes, but that's
> offtopic).
>
>
I have used MySQL cluster and Postgres with my own replication daemon.
Neither are ideal solutions for a multitude of reasons. Also, using
something like Corosync's IPC messaging is not ideal either. Besides that,
the FreeSWITCH core has a sufficiently robust API for doing network
messaging, and also its own event system which happens to be perfectly
suited for exactly the kind of messaging I need to accomplish. Using the
Corosync IPC messaging API would be like trying to shove a large round peg
through a small square hole.

> Fourth, according to the docs: single configuration for all FS instances,
>> rather than manually ensuring each one has the same config.
>
>
> This could be achieved with the Corosync API.
>
> Fifth: Voicemail clustering? Or we'll have to wait for mod_voicemail's
>> APIs to be rewritten for that, perhaps...
>
>
> That's not going to happen without a storage API added to the FS core -
> you're always going to need some 3rd party such NFS, CIFS, DRBD etc.
> mod_voicemail is hardcoded to use the ODBC interface but that's only for
> the index not the recordings.nc
>
> Corosync would allow you to make FS depend on the NFS/whatever service
> running so if the storage backend has failed FS would move to a node where
> it is available - not necessarily possibly from within mod_ha_cluster
> itself.
>

Well, I *could* broadcast an entire voicemail to all nodes, but that seems
like a bit of overkill, and I don't see any real reason to code a storage
system into FreeSWITCH. That really would be reinventing the wheel. I don't
think the majority of users would find it all that difficult to set up a
shared NFS space somewhere, or even use something like MooseFS. In fact,
MooseFS is so easy to use, I went from reading about it for the first time
to having a fully functional 6-node shared storage cluster in about 45
minutes. This is one place where a 3rd party solution is really the optimal
approach.

> There's certainly *something *special possible with mod_ha_cluster that
>> can't be done with existing solutions cleanly, if at all...
>
>
> Don't misunderstand me, I think it's a great idea to have a module aimed
> at HA and sharing state across a cluster while being able to detect new
> failure conditions.
>
> I just think that in the specific area of node monitoring and messaging
> across the cluster it would be better to use a well tested and proven
> solution such as Corosync which is based on a large number of papers,
> algorithms, and generally decades of work. Every time I've seen/heard of an
> attempt to redo that from scratch it's been unreliable especially in
> unexpected failure conditions. Simply because it's a dependency on another
> program isn't a good enough reason not to use it - you're already depending
> on various other programs (Linux, sysvinit, monit, cron, syslog etc)
> anyway. By not using it you're just adding extra work, adding unnecessarily
> complexity and increasing the risk of bugs. Corosync would also have
> advantages because the tools to migrate services to another node
> for maintenance, detect and restart resources on failure etc already exist.
>

I think you greatly overestimate the complexity of the messaging and node
monitoring required to make FS run as a multi-master, multi-slave cluster,
detect a node failure, and have it automatically recover calls on a slave
system. You seem to be thinking of the task as if it were a general purpose
HA solution which needs to work for any random piece of software across
every imaginable network configuration and software deployment scenario.

The messaging I need to do is simple:

1) I need to send events from the local FS node to all other slave nodes in
the cluster so they can synchronize call state, registration information,
call limiting information, etc.
2) I need to send heartbeats from all nodes to all nodes out all configured
NICs so they can keep track of who is in which state and which is the
designated slave in case of failure.

These two tasks accomplish everything needed to make this system work. The
heartbeats let each node calculate the state of the cluster. There is no
"single brain" in the cluster. There is a very specific and well-defined
set of rules by which the state of the cluster is calculated by each node,
and all nodes will always arrive at the same conclusion so long as you have
multiple physical networks to ensure communication is never broken between
sets of nodes.

The decision making happens on the slave nodes only. The masters are
already in their role and will stay that way unless something catastrophic
occurs.  They have ways to detect internal failures, and I will have a way
set up for them to detect and deal with a segfault of FS, as well. If such
a failure occurs, their single purpose task is to shut down as cleanly as
possible and get themselves back into a stable state. The slaves also have
only one goal: to recover the calls of the failed master and become the new
master. One slave is always designated as the primary candidate, and all
slaves know which it is at all times (once they exchange their first
heartbeats). They also know which is the secondary, tertiary, etc. It is
all pre-determined up-front and only changes if a failure, reconfiguration,
or maintenance event occurs. When the primary sees a master go down, it
will perform a sanity check on itself first (to make sure it didn't have an
issue) and then take over for the first master it saw go down. Once it has
started that process, it broadcasts that it is switching to master and
which master it is taking over for. At this point, the secondary
immediately becomes primary to all remaining slaves and it is immediately
available to take over for any other master nodes which might also have
failed simultaneously.

The whole process is a very specific and very orchestrated process. The
whole thing is barely comparable to the complexity of the failure detection
and fail over process which Pacemaker has to deal with. I am not trying to
recreate something like Pacemaker. It is a hugely complex system which has
to deal with all sorts of random and generalized configurations of endless
types of software. Pacemaker is like an interstellar spacecraft while I
just need a rocket to put a satellite in orbit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130210/58d0243d/attachment-0001.html