[Freeswitch-users] High Availability Cluster Module for FreeSWITCH

Tue Feb 12 21:19:48 MSK 2013

On Tue, Feb 12, 2013 at 9:21 AM, Marcin Gozdalik <gozdal at gmail.com> wrote:

> 2013/2/11 Eliot Gable <egable+freeswitch at gmail.com>:
> > On Mon, Feb 11, 2013 at 7:36 AM, Marcin Gozdalik <gozdal at gmail.com>
> wrote:
> >>
> >> +1
> >>
> >> I do not doubt mod_ha is necessary inside of FS  and it may be
> >> better/simpler than writing Pacemaker resource agent, but writing
> >> yet-another-cluster-communication-engine is IMHO the wrong way to go
> >> and using Corosync for communication will give a lot of value from
> >> mature codebase.
> >>
> >
> > I understand what you are saying, but what I am trying to get across is
> that
> > I am not writing yet-another-cluster-communication-engine. All I am
> really
> > doing is combining a multicast messaging API written by Tony and the
> event
> > API in FS to broadcast existing state information between multiple FS
> nodes,
> > as well as adding a tiny amount of logic on top of that to coordinate
> call
> > fail over and recovery. That's probably a little over-simplified, but it
> > gets the point across. The network communication code is already in FS
> and
> > well tested. The event system is already in FS and well tested.
>
> I also think I understand what you are saying. It means we have
> trouble putting thought into writing ;)
> >From what I understand what you are trying to achieve is that every
> node in FS "cluster" knows what are the nodes and whether they are
> down or up.
> What I am saying is that this simple task is fundamentally hard.
> Sending and receiving multicast is easy, but keeping distributed state
> consistent between nodes in cluster is hard (like in really hard,
> harder than writing VoIP softswitch all over again), especially in
> case of Byzantine failures (i.e. nodes lying that they are down when
> they are up or other way round). I am no big expert in the area but
> seen at least 2 cases (MMM -
> http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/ and Chubby
> in Google -
> http://www.read.seas.harvard.edu/~kohler/class/08w-dsi/chandra07paxos.pdf)
> where people were trying to write (MMM) or use (Chubby) some kind of
> distributed code and failed.
> That's why whenever I see anything related to distributed state I say
> that it's way beyond my understanding and best is to use something
> that works.
>

You were fortunate to have that resource available, as well as (I assume)
an already made resource agent available for managing FreeSWITCH. I had to
learn it from this:

http://clusterlabs.org/doc/en-US/Pacemaker/1.0/pdf/Pacemaker_Explained/Pacemaker_Explained.pdf

I also had to craft a resource agent to manage FreeSWITCH (none existed at
the time). Then I found out Pacemaker was buggy (it has gotten much better
since I started using it) and wouldn't properly honor colocation
constraints or grouping correctly in certain failure conditions, so I had
to make the resource agent handle managing all the IP addresses for
FreeSWITCH (each instance had 12 Sofia profiles with each one running on a
different IP). I spent months testing hundreds of different possible
failure conditions and fixing dozens if not hundreds of bugs in the
configuration and in how the resource agent managed everything and reported
on the health of FreeSWITCH. Everything from someone accidentally removing
a needed IP from the system to a failed hard drive to a Sofia profile
failing to load to firewall rules accidentally blocking needed ports, etc.
If you spent only one day setting up such a system, I am certain you failed
to account for dozens if not hundreds of possible failure conditions. At
the end of those 3 months of hell, I had a single pair of nodes which I
could rely on to "do the right thing" under practically any failure
condition. However, even then, I still had several dozen ways I could
simulate FreeSWITCH failing which the system simply could not
detect efficiently. I made attempts at testing some of them, but the load
induced on the system to test them frequently enough to matter made the
system fall outside the specifications I needed for the project to be
profitable and workable.

I have years of experience building and deploying FreeSWITCH clusters with
Pacemaker and Corosync and hunting down and gracefully handling practically
every conceivable way such a system could fail. I understand you think it's
hard to do; and that is not without reason. I've lived it; I've done it. I
know what's involved in the process. I simply want to take my experience
with it and write it down in code in the form of mod_ha_cluster so that
other people don't have to waste their time relearning all the things I
already know with regard to making FreeSWITCH run in an HA setup. In the
absence of Pacemaker and Corosync, my goal is to provide mod_ha_cluster
enough awareness that the vast majority of failure cases are handled
gracefully and FS can take care of itself for bringing a slave online to
take over for a failed node. However, there is no reason I cannot also
write it to let Pacemaker and Corosync give it direction as to which slave
to turn into a master. So, if it makes you more comfortable, think of it as
a glorified resource agent which always happens to know about the "deep
state" of the nodes and can test for things that traditional resource
agents can never do effectively. Then, when you do a "shallow" poll of the
state, you can get back the "deep state" instead, but at the cost of doing
a "shallow" test. And, on top of all that, it will handle synchronizing
various data between the nodes so you don't need to rely on an external HA
database.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130212/51d47513/attachment.html