[Freeswitch-users] High Availability Cluster Module for FreeSWITCH

Eliot Gable egable+freeswitch at gmail.com
Sun Feb 10 21:14:23 MSK 2013


On Sun, Feb 10, 2013 at 12:08 PM, Steven Ayre <steveayre at gmail.com> wrote:
> That covers redundancy in case of a network card or cable failure, but isn't
> what partitioning is about. Multiple NICs cannot prevent partitioning.
>
> As an example, partitioning might happen when a network switch between two
> network segments fails so you have nodes A+B in segment 1 able to talk to
> each other but unable to talk to nodes C+D in segment 2, while C+D can talk
> to each other but not A+B.
>
> Pacemaker/corosync contain a lot of algorithms to fence off partitions
> without quorum and can resort to things like STONITH if required to force a
> node to shutdown rather than risk it causing disruption to the cluster (for
> example if it tries to take over traffic to a virtual IP you could end up in
> a case where you have two servers sending ARP responses for the same IP).
>

Steve,

As Avi pointed out, I mentioned having multiple physical networks as a
guard against a network split / partition. If one network is split
such that A and B can talk to each other over it and C and D can talk
to each other over it, you would indeed have an issue if you only had
one network. However, with two or more networks, all four nodes will
still be able to talk to each other over the other network(s).

Now, granted, if you have a network split in all networks, then you
are still screwed. Pacemaker and other solutions deal with this, as
you mentioned, using something called "quorum" where you need a
majority of nodes to be able to see each other, and they fence the
remaining nodes. As I documented on my wiki page for the module, I do
have plans to eventually support such functionality. However, that is
a bit further down the road as it will take some time to develop
STONITH interfaces to various hardware or even to reuse the STONITH
modules from Pacemaker or another project. In any case, I feel it is
more important to get the base functionality developed and debugged as
utilizing multiple networks is a good way to prevent network splits
from being an issue.

That being said, there are other issues to contend with when
discussing network splits. For example, if A and B can see the
Internet but C and D cannot, but C is a Master and B is a slave, you
still have an issue to address. In this case, mod_ha_cluster must be
able to determine that C and D cannot see the Internet. They need to
perform very fast pings to some IP address, or have some external host
sending them data in some way that they can detect when traffic
to/from the Internet has stopped. I can place a media bug on the audio
streams to make this determination fairly accurately. I can also rely
on a ping mechanism to make the determination. Once the determination
is made, mod_ha_cluster then has to promote B to a master to take over
C.

So, there are certainly still other issues to address when a network
split occurs, but split-brain is easily avoided by simply adding
redundant networks.



Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users mailing list