[Freeswitch-users] High Availability Cluster Module for FreeSWITCH

Tue Feb 12 21:41:10 MSK 2013

On Tue, Feb 12, 2013 at 10:19 AM, Eliot Gable
<egable+freeswitch at gmail.com>wrote:

> On Tue, Feb 12, 2013 at 9:21 AM, Marcin Gozdalik <gozdal at gmail.com> wrote:
>
>> 2013/2/11 Eliot Gable <egable+freeswitch at gmail.com>:
>> > On Mon, Feb 11, 2013 at 7:36 AM, Marcin Gozdalik <gozdal at gmail.com>
>> wrote:
>> >>
>> >> +1
>> >>
>> >> I do not doubt mod_ha is necessary inside of FS  and it may be
>> >> better/simpler than writing Pacemaker resource agent, but writing
>> >> yet-another-cluster-communication-engine is IMHO the wrong way to go
>> >> and using Corosync for communication will give a lot of value from
>> >> mature codebase.
>> >>
>> >
>> > I understand what you are saying, but what I am trying to get across is
>> that
>> > I am not writing yet-another-cluster-communication-engine. All I am
>> really
>> > doing is combining a multicast messaging API written by Tony and the
>> event
>> > API in FS to broadcast existing state information between multiple FS
>> nodes,
>> > as well as adding a tiny amount of logic on top of that to coordinate
>> call
>> > fail over and recovery. That's probably a little over-simplified, but it
>> > gets the point across. The network communication code is already in FS
>> and
>> > well tested. The event system is already in FS and well tested.
>>
>> I also think I understand what you are saying. It means we have
>> trouble putting thought into writing ;)
>> >From what I understand what you are trying to achieve is that every
>> node in FS "cluster" knows what are the nodes and whether they are
>> down or up.
>> What I am saying is that this simple task is fundamentally hard.
>> Sending and receiving multicast is easy, but keeping distributed state
>> consistent between nodes in cluster is hard (like in really hard,
>> harder than writing VoIP softswitch all over again), especially in
>> case of Byzantine failures (i.e. nodes lying that they are down when
>> they are up or other way round). I am no big expert in the area but
>> seen at least 2 cases (MMM -
>> http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/ and Chubby
>> in Google -
>> http://www.read.seas.harvard.edu/~kohler/class/08w-dsi/chandra07paxos.pdf
>> )
>> where people were trying to write (MMM) or use (Chubby) some kind of
>> distributed code and failed.
>> That's why whenever I see anything related to distributed state I say
>> that it's way beyond my understanding and best is to use something
>> that works.
>>
>
>
> You were fortunate to have that resource available, as well as (I assume)
> an already made resource agent available for managing FreeSWITCH. I had to
> learn it from this:
>
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/pdf/Pacemaker_Explained/Pacemaker_Explained.pdf
>
> I also had to craft a resource agent to manage FreeSWITCH (none existed at
> the time). Then I found out Pacemaker was buggy (it has gotten much better
> since I started using it) and wouldn't properly honor colocation
> constraints or grouping correctly in certain failure conditions, so I had
> to make the resource agent handle managing all the IP addresses for
> FreeSWITCH (each instance had 12 Sofia profiles with each one running on a
> different IP). I spent months testing hundreds of different possible
> failure conditions and fixing dozens if not hundreds of bugs in the
> configuration and in how the resource agent managed everything and reported
> on the health of FreeSWITCH. Everything from someone accidentally removing
> a needed IP from the system to a failed hard drive to a Sofia profile
> failing to load to firewall rules accidentally blocking needed ports, etc.
> If you spent only one day setting up such a system, I am certain you failed
> to account for dozens if not hundreds of possible failure conditions. At
> the end of those 3 months of hell, I had a single pair of nodes which I
> could rely on to "do the right thing" under practically any failure
> condition. However, even then, I still had several dozen ways I could
> simulate FreeSWITCH failing which the system simply could not
> detect efficiently. I made attempts at testing some of them, but the load
> induced on the system to test them frequently enough to matter made the
> system fall outside the specifications I needed for the project to be
> profitable and workable.
>
> I have years of experience building and deploying FreeSWITCH clusters with
> Pacemaker and Corosync and hunting down and gracefully handling practically
> every conceivable way such a system could fail. I understand you think it's
> hard to do; and that is not without reason. I've lived it; I've done it. I
> know what's involved in the process. I simply want to take my experience
> with it and write it down in code in the form of mod_ha_cluster so that
> other people don't have to waste their time relearning all the things I
> already know with regard to making FreeSWITCH run in an HA setup. In the
> absence of Pacemaker and Corosync, my goal is to provide mod_ha_cluster
> enough awareness that the vast majority of failure cases are handled
> gracefully and FS can take care of itself for bringing a slave online to
> take over for a failed node. However, there is no reason I cannot also
> write it to let Pacemaker and Corosync give it direction as to which slave
> to turn into a master. So, if it makes you more comfortable, think of it as
> a glorified resource agent which always happens to know about the "deep
> state" of the nodes and can test for things that traditional resource
> agents can never do effectively. Then, when you do a "shallow" poll of the
> state, you can get back the "deep state" instead, but at the cost of doing
> a "shallow" test. And, on top of all that, it will handle synchronizing
> various data between the nodes so you don't need to rely on an external HA
> database.
>
> +100!

In my other post to this thread I postulated that if it were that easy then
someone else would have done it already or that it should be "easily"
doable. I was not aware of your excruciatingly intimate familiarity with
the solutions that the others have been suggesting as alternatives for what
you are working on.

I think the "glorified resource agent" analogy is particularly descriptive.
:)

Hopefully others will chime on on the other thread about the message
bus/shared key storage<http://lists.freeswitch.org/pipermail/freeswitch-users/2013-February/092185.html>so
we can take this feature to the next level!

-- 
Michael S Collins
Twitter: @mercutioviz
http://www.FreeSWITCH.org
http://www.ClueCon.com
http://www.OSTAG.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130212/65a76a41/attachment-0001.html