On Mon, Feb 11, 2013 at 7:36 AM, Marcin Gozdalik <span dir="ltr"><<a href="mailto:gozdal@gmail.com" target="_blank">gozdal@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+1<br>
<br>
I do not doubt mod_ha is necessary inside of FS and it may be<br>
better/simpler than writing Pacemaker resource agent, but writing<br>
yet-another-cluster-communication-engine is IMHO the wrong way to go<br>
and using Corosync for communication will give a lot of value from<br>
mature codebase.<br><br></blockquote><div><br></div><div>I understand what you are saying, but what I am trying to get across is that I am not writing yet-another-cluster-communication-engine. All I am really doing is combining a multicast messaging API written by Tony and the event API in FS to broadcast existing state information between multiple FS nodes, as well as adding a tiny amount of logic on top of that to coordinate call fail over and recovery. That's probably a little over-simplified, but it gets the point across. The network communication code is already in FS and well tested. The event system is already in FS and well tested. I have already written the code to the point that it parses the configuration files and starts sending heartbeats out all of the interfaces configured. I have also already written a lot of the code that deals with the state transitions. All I am talking about doing is implementing a tiny little finite state machine. It's a pretty trivial programming task. In fact, I think it was covered in my first year at Carnegie Mellon University. Of course, I had already figured out how to write such things in high school, I just did not know what it was called at that point. My point is, that this is not yet-another-cluster-communication-engine. It is a very specific and small finite state machine designed solely with the goal in mind of making FS have just enough information to coordinate call fail over internally. If I recall correctly, a lot of people also said writing yet-another-VoIP-server was a waste of time, but now we have FreeSWITCH, and it was obviously worth the effort. And I am not even trying to do something as complex as that. If you think this is yet-another-cluster-communication-engine, you are missing the point. It is not. It never will be. </div>
<div><br></div><div>Look at Sonus, Genband, Broadsoft, Veraz, etc. All the big-name carrier-grade telecom providers have a built-in solution for automatic call fail over. The only way FreeSWITCH will ever compete with such solutions is if it also has that feature. Pacemaker and Corosync are overkill just to get FS to handle single node failures and provide call recovery. It took me a full 3 months of working with them every day to really understand how to deploy them properly in conjunction with FreeSWITCH and Postgres to provide a carrier-grade hot-standby solution which was robust enough to handle 99% of the failures I could throw at it. Granted, this was back when the configuration still needed to be written by hand in XML and prior the existence of any resource agent for FreeSWITCH. But, even with those changes, deploying Pacemaker and Corosync is not a simple task. If that is the requirement for FS to have HA, it will never truly stand a chance against commercial offerings.</div>
<div><br></div><div><br></div></div>