[Freeswitch-users] FreeSWITCH HA + Loadbalancing
David Knell
dave at 3c.co.uk
Sat Aug 29 11:05:57 PDT 2009
Hi Raimund,
The difficult bit in all of this is having calls seamlessly transferred
from one box to another when the first box dies. There's a lot of state
associated with a call, not all of which is easy to replicate across.
99.99% uptime implies an average of no more than 15 seconds downtime
during a 40 hour business week (or about one unscheduled reboot every
couple of months), which is easily achievable using FreeSWITCH (and, in
my experience, Asterisk) on standard hardware.
People agonize about their four or five nines too much, in my opinion.
Folk are used to their phones crashing, needing rebooting and dropping
calls - we've cellphones to thank for that - and a half-decent VoIP
solution will knock the spots off your favourite mobile carrier for
reliability. Plus there's all the external factors - 99.999% uptime on
your PRI means that someone can only drive a digger through the cable
once every 273 years, assuming that it takes a day to fix, and no
telco's going to give you that in an SLA. And your power can't go off.
And so on.
Lastly, I'm afraid that virtualization and VoIP don't play well
together, at least not if you want to achieve a sensible density. The
large number of small packets being moved around the network interfaces
- both physical and virtual - will quickly chew up your CPU.
Cheers --
Dave
> Thinking about it, maybe we can create a solution, if some of us work
> together:
>
>
> My strength are in virtualization, linux, development, databases,
> integration, etc.
> What I do not now much about is how SIP (and everything else for that
> matter in the Voice world) works under the hood, and how it's
> implemented in FS.
>
>
> I know that the state information for a call has to be stored and
> retrieved somewhere and somehow, only I do not know that part. What I
> know is that it hast to be do-able to store all the stream information
> (ip's, port's, current state's, etc.) in a very fast database (e.g. my
> idea would be memcached) so another FS could just take this
> information and take over the call, maybe you loose a second of voice,
> maybe you loose the recorded call file or a part of it, but that
> should be it. (SipFoundry has a boxed opensource PBX, which, of course
> is not flexible like FreeSWITCH or Asterisk, but has Call Live
> Migration and Call Live Failover integrated!).
>
>
> What I want is for my company to be able to sell a 99.99 uptime PBX
> (we do mostly call-center related stuff), which can scale well, and
> can grow with the company without lot's of hassles, my Dream would be:
>
>
> To begin with:
>
>
> One Hardware Node with the essential hardware (digium cards for
> example).
> On this node are OpenVZ virtualized containers:
> [VirtCnt1: FS which only talks to the Hardware and forwards
> everything] = Could be replaced with hardware media gateway, etc.
> [VirtCnt2: FS which handles the PBX] \___ Loadbalanced, with odbc or
> xml, Failover, Livetakeover
> [VirtCnt3: FS which handles the PBX] /
> [VirtCnt4: Database for state information] (maybe something as
> resource-friendly as memcached? ressource heavvy database?)
>
>
> With this we can achieve all this:
>
>
> Problem with VirtCnt2 (e.g. crash, lock, ...)
> * VirtCnt3 can take over.
> -> You are free without stress to investigate the problem, you can
> debug and analyze whyle the machine is still running
> -> you can also create a machine-state-dump of the virtual container,
> dump the container as well, copy the data to your lab and restore the
> machine up the state which it was running with the problem, so you can
> liveinvestigate it in the lab (some prerequirements given, but easy
> doable)
> -> just think about the possibility of better bugreports because
> someone can take the time to read out all the data with GDB to
> investigate the proper cause of a machine Lock!
>
>
> You want to upgrade to a new FreeSWITCH version?
> * Take VirtCnt2 out of the LoadBalancing Scheme,
> * Stop it, Clone it,
> * Upgrade FreeSWITCH in the cloned Container
> * Start the cloned container
> * if there's something wrong, stop it and restart the original
> VirtCnt2
> -> No problem at all, you can Test on the Live Hardware, with part of
> the Live users (maybe a low-volume queue) to be sure everything works
> out fine before you activate the full loadbalance
>
>
> Server on it's own can't handle the load
> * Buy new machine
> * Setup Hardware Node
> * Livemigrate VirtCnt3 (no downtime)
>
>
> Now the first Server with the VrtCnt1 and VirtCnt2 as well has to much
> load
> * Buy new machine
> * Setup Hardware Node
> * Livemigrate VirtCnt2 (no downtime)
> -> Now you have a 3 server solution (1 mediaprox, 2 loadbalanced /
> failover PBXes) out of the first box you bought, without headaches,
> because the system was built for it from the beginning!
>
>
> The Database drains to much?
> * Buy new machine
> * Setup Hardware Node
> * Livemigrate database VirtCnt4 (no downtime)
>
>
> You want to upgrade Hardware/Kernel in Hardware node 1?
> * Livemigrate VirtCnt2 to a hotstandby machine, or to the other PBX
> machine, upgrade the hardware, Re-Livemigrate the containers. (no
> downtime)
> * OR just break the loadbalancing, wait until all current calls are
> teared down correctly, upgrade machine, reenable the loadbalancer
>
>
> You want an exact copy of the first server for Hardware HA?
> * Buy new machine
> * Setup Hardware node
> * Buy hardware PRI switchover box
> * Clone VirtCnt1 - VirtCnt4 to the new machine
> * Make basic failover configuration
>
>
>
>
> -> the sky's the limit, as the saying goes ...
>
>
>
>
> So, I can do all the openvz stuff and the integration with database /
> memcached / heartbeat / whatever is needed here, someone there to be
> willing to work with me on this on the FreeSWITCH side? or at least
> provide me with the necessary information about what's needed / how to
> talk / what states from FreeSWITCH?
>
>
> I know this seems very ambitious but if this could be made in a rather
> relativly easy to setup package, with good documentation, it would be
> a boost for FreeSWITCH, i am sure, because after all this is what
> everyone is grown accustomed to from good old phone companys and the
> good old pbx's: carrier grade uptimes ...
>
>
> Thanks for everyone reading up until here,
> all the best,
>
>
> Ray
>
>
>
>
>
>
> --
> Raimund Sacherer
> -
> RunSolutions
> Open Source It Consulting
> -
>
> Parc Bit - Centro Empresarial Son Espanyol
> Edificio Estel - Local 3D
> 07121 - Palma de Mallorca
> Baleares
>
> On Aug 29, 2009, at 3:17 PM, Raimund Sacherer wrote:
>
> > Oh yeah, that would be so helpfull for my situation, as my client
> > *demands* now a solution where he can press a big red button and all
> > fails over to another box. Hi es totally scared because of the
> > Lockups in Asterisk which under specific situations including AMI,
> > Automated Call Setup, and murphy led to a lockup of the entire
> > machine, no console was working anymore, only cold-reset could do
> > it.
> >
> >
> > So, IF there is the possibility for life-takeover, / failover etc. I
> > would love to here how has been done.
> >
> >
> > I am very experienced with openvz and use for about two years now
> > only openvz virtualization servers for anything because of
> > live-migration etc. But as I am new in this company we could not
> > adopt this until now.
> >
> >
> > So Please Ken, if you can, describe what need's to be done to get a
> > failover / takeover working (an outline would be enough)
> >
> >
> > Thanks in Advance
> >
> > --
> > Raimund Sacherer
> > -
> > RunSolutions
> > Open Source It Consulting
> > -
> >
> > Parc Bit - Centro Empresarial Son Espanyol
> > Edificio Estel - Local 3D
> > 07121 - Palma de Mallorca
> > Baleares
> >
> > On Aug 29, 2009, at 11:58 AM, Steve Kurzeja wrote:
> >
> > > On Sat, Aug 29, 2009 at 2:34 PM, Diego Viola
> > > <diego.viola at gmail.com> wrote:
> > > Yes, FreeSWITCH is a system that you can trust 100%. I
> > > have switched my Asterisk servers to FreeSWITCH and have
> > > peace now.
> > >
> > > If I were you I would get rid of Asterisk and use
> > > FreeSWITCH, FS will handle all what you want very well.
> > >
> > > And I agree with David, fail-over is kinda irrelevant
> > > since the FS doesn't crash like Asterisk does.
> > >
> > >
> > >
> > > You still have hardware failures and fail-over is also useful for
> > > hit-less maintenance on boxes.
> > >
> > > I'd be interested to know how Brian West was approaching his live
> > > migration work.
> > >
> > > Steve
> > > _______________________________________________
> > > FreeSWITCH-users mailing list
> > > FreeSWITCH-users at lists.freeswitch.org
> > > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> > > UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> > > http://www.freeswitch.org
> >
> >
> > _______________________________________________
> > FreeSWITCH-users mailing list
> > FreeSWITCH-users at lists.freeswitch.org
> > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> > UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> > http://www.freeswitch.org
>
>
> _______________________________________________
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
--
David Knell, Director, 3C Limited
T: +44 20 3298 2000
E: dave at 3c.co.uk
W: http://www.3c.co.uk
More information about the FreeSWITCH-users
mailing list