[Freeswitch-users] Freeswitch HA advices

Eliot Gable egable+freeswitch at gmail.com
Fri Jun 13 17:45:50 MSD 2014


On Tue, Jun 10, 2014 at 10:56 AM, Federico Castro <fcastelco at gmail.com>
wrote:

> Hi all, I'm working on a Freeswitch HA solution. Now I'm deciding what
> method and DB I'll use to track calls.
>
> I have installed PostgreSQL on both servers and I configured them to
> replicate DB asynchronously.
>
> I would like to know if someone has experience with this kind of solution
> and what things do I have to contemplate to deploy a solid solution.
>
>
Lots of people have experience with such a solution; it all depends on what
you are trying to achieve.

Personally, I recommend you setup Corosync and Pacemaker both on your
PostgreSQL boxes and on your FreeSWITCH systems. I also recommend you run
PostgreSQL on a separate set of boxes from FS. Both can use a lot of memory
if you are running a lot of calls and/or have a lot of clients. If you need
performance, I recommend using the fastest disks you can get in the
PostgreSQL systems. Also install as much RAM as you can afford for the
project in the PGSQL boxes. You will want redundant power supplies in each
system with each supply plugged into a different circuit. You will also
want redundant Ethernet connectivity to redundant switches which also have
redundant power supplies. You will also want redundant cross-over
connections between the pairs of boxes.

Once you have Corosync and Pacemaker configured to start PGSQL and FS on
their own boxes and you have tested manual fail-over, then you need to
start thinking about every possible way you can make either of those two
systems stop working. Think about hard drives failing, power loss, kernel
panics, firewall rules blocking communication, someone accidentally
removing the IP address from one of the systems (it happens), killing
processes, Sofia profiles failing to load because something else is using
the port, etc. Make sure you have things set up to detect and recover from
any such failure. One of the best ways to do this is to actually build an
external testing system which places real calls through the system and has
them route back to itself to verify they made it. If it places a call and
the call does not make it back to itself, then you know something failed
and you can run more tests to determine what failed and reset it.

Like I said, it all depends on what you are trying to accomplish. If you
want really good automatic HA, you have to go to some pretty great lengths
to get it. If you are OK with occasional manual intervention, then you can
make some assumptions (like nobody accidentally removing your IP from the
interface or telling it to stop responding to ARP or throwing up a firewall
rule which blocks something). That makes the setup considerably easier, but
it also means manual intervention when something like that happens. In
other words, if something like that happens, you experience an outage which
the HA system doesn't detect and recover from. When you get calls that
service stopped working, you then have someone log in and take a look and
manually fix the issue. This could take anywhere from 5 minutes to an hour
or more to do, depending on how good your support is and how good your team
is.

So, probably the first task you should do is list all the things you want
it to automatically recover from and all the things you are willing to
accept causing an outage and then work on your implementation based on that
plan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20140613/d8d8320a/attachment-0001.html 


Join us at ClueCon 2016 Aug 8-12, 2016
More information about the FreeSWITCH-users mailing list