[Freeswitch-users] Installation report, a python crash and a bottleneck

Krešimir Tonković kresho at multimodus.hr
Thu Jun 12 03:29:26 PDT 2008


Hi!

I'm new to freeswitch and I like to report my success with it and a few
failures. I'll be a little bit vague on some details because I must protect
some business details. Sorry for that.

I have no experience with asterisk, so many concepts were new to me.

We run a hosted IVR system with a few hundred lines. We have a few servers
running a SIP/VoiceXML application server and connect to the network with
SIP/ISDN gateways.

Recently we started an IVR with very short calls and very high CPS. Our
existing software doesn't handle this scenario very well, so I started
looking into alternatives.

FreeSwitch caught my eye because of its support for multiple scripting
languages. I love python and this feature put FS into the evaluation list.
So I started on friday. I installed FS from the debian repositories on my
ubuntu 8.04 laptop and tried some examples from the wiki ("Some thing to try
out!"). I was very impressed that everything worked right out of the box.

I was a little disappointed that mod_python wasn't included in the
distribution so I checked out the source and compiled everything. An hour
later I had another installation of FS.

It took me a few hous to get the dialplan right. Because our service only
runs IVRs and uses no switching, I removed everything from the default
dialplan (mainly because it conflicted with the ANI numbers we get from the
gateways).

Another hour later, I had a simple IVR in python done: use a web service for
a database lookup and play an appropriate prompt. I didn't use the database
directly because I wanted the best possible comparison to what our current
system does, and (our) VoiceXML can't use databases directly, but can use
web services.

In less than 1 working day I had everything running. Quite good.

Time for load testing :-) Our old software handles around 20 CPS on my
laptop. I inceased max_sessions to 5000 and sessions-per-second to 100 and
started sipp. The result was quite bad - I could not get over 8 CPS! The
processor barely noticed that FS was running, so I had no idea what the
bottleneck was. I still don't. After fiddling with this for a while, I gave
up and decided to try it on one of our production machines. Weekends are not
very busy, so I took one offline. It's a HP proliant server with 1 quad-core
xeon on 2 GHz, 2G ecc ram and 10krpm disks. The server is also running
ubuntu 8.04 (server edition) so I just copied the binaries.

I ran sipp from another machine, with the uac scenario and limiting the call
duration to 4secs:
sipp <FS server ip> -sn uac -d 4000 -s <ivr_number>
Theoretically, as each call lasts 4 seconds, the total calls number should
never exceed 4x current CPS.

These are the results:

With up to 27 CPS everything was stable. The calls count was almost exactly
4 timee the CPS, indicating that new calls were ansewered immediately. This
I also verified by calling in.

Up to 30 CPS everything was stable for a while, but then the total calls
number exploded to the limit set by sipp. The processor load was very
reasonable, so I again I ran into the bottleneck mentioned above. After sipp
hits the total call limit, it will not create new calls until some are
released. So CPS oscillated between 0 and 30 as shown by sipp. CDRs show
that there was an average of 27 CPS. At this point, when I called in, I got
ringback tone for as long as the operator allows (60s) and then I was
dropped. With a softphone I could reach the IVR after about 80 sec.

When I set sipp to more than 30 CPS, the number of total calls exploded
immediately.

Experimenting some more, I found I could contain the explosion (and the
instability in CPS) by limiting the number of total calls to 4x current cps
when cps was up to 30. Thus, by starting sipp like this:
sipp <FS server ip> -sn uac -d 4000 -s <ivr_number> -l 120
I could go up to 30CPS and get reasonably stable real 30CPS. When calling in
with a real phone, I would reach the IVR after 2-3 seconds of ringback,
which is acceptable. This simulation run for several hours without any other
problems.

With
sipp <FS server ip> -sn uac -d 4000 -s <ivr_number> -l 160
and setting cps to 40, the total calls count obviously never passed 160, but
the cps shown by sipp became unstable, oscillating between 0 and 40.

These results are only slightly better than our current SIP server.

Today I put FreeSwitch into production. The unexpected thing here was that
when FreeSwitch talked to our gateways instead of sipp, it crashed a few
times. I don't associate these crashes with load because it happened equally
on low and high load. Here's the log:

2008-06-09 09:34:33 [NOTICE] switch_core_session.c:753
switch_core_session_thread() Session 166 (sofia/internal/---- deleted stuff
-----) Ended
2008-06-09 09:34:33 [NOTICE] switch_core_session.c:755
switch_core_session_thread() Close Channel sofia/internal/---- deleted stuff
----- [CS_HANGUP]
2008-06-09 09:34:33 [CRIT] switch_core_state_machine.c:218 print_trace()
Obtained 10 stack frames.
/usr/local/freeswitch/lib/libfreeswitch.so.1 [0xb7e413b1]
[0xb7f69420]
/usr/local/freeswitch/mod/mod_python.so [0xb011b46a]
/usr/lib/libpython2.5.so.1.0(PyCFunction_Call+0xfa) [0xb002a50a]
/usr/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xafff38d7]
/usr/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x4067) [0xb0076907]
/usr/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x748) [0xb007a368]
/usr/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x601c) [0xb00788bc]
/usr/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x748) [0xb007a368]
/usr/lib/libpython2.5.so.1.0 [0xb001667f]
2008-06-09 09:34:33 [CRIT] switch_core_state_machine.c:319
switch_core_session_run() Thread has crashed for channel sofia/internal/----
deleted stuff -----

It seems like mod_python is not quite ready for production yet :-) I have a
few core dumps available on demand, cca 2MB each.

Turning crash protection on didn't help. FreeSwitch would reject new calls,
and wouldn't shutdown completely. I had to kill it.

I would still like to replace our existing system with FreeSwitch because I
find way more comfortable to work with. It has great potential and I'm sure
it is being succesully deployed in many places as I write this.

The python crash is probably a simple bug. The invisible bottleneck is what
troubles me more. Any help would be greatly appreciated.

Finally, this is all with FreeSwitch Version 1.0.pre4 (8760).
-- 
kresho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20080612/35337477/attachment-0002.html 


More information about the FreeSWITCH-users mailing list