[Freeswitch-users] Testing Freeswitch performance led to strange behavior
Apostolos Pantsiopoulos
regs at kinetix.gr
Thu Jun 4 07:32:22 PDT 2009
In the process of trying to use Freeswitch in a production
environment I conducted a number of performance tests using
various servers. It was then that I noticed some strange behavior
from FS. When I stripped down the scenario I was using to a simple
bridge scenario, I stumbled upon a strange behavior.
The scenario as I stated is quite simple.
|---------| |---------|
| |------- Call from sipp------> | |
| sipp | | FS |
| | <------ Call back to sipp----| |
|---------| |---------|
I did not use an RTP stream for my calls just to test
the signaling alone.
The sipp scenario is the standard uac.xml scenario that
can be found integrated to sipp with the following options :
Test FS 1:
sipp <FS_IP>:5060 -s 55555555 -i <SIPP_IP> -mi <SIPP_IP> -ci <SIPP_IP>
-r 10 -d 5000 -l 100 -m 1000 -sf uac.xml
Calls : 1000
Successful calls : 1000
Idle CPU during tests : ~(35-60) % (35 during the generation of new
calls, 60 during the -l limit imposed by the test)
Note : 985 of them had a duration (billsec) of 10 and 15 of them had a
duration of 11.
I tried raising the call rate and limit...
Test FS 1:
sipp <FS_IP>:5060 -s 55555555 -i <SIPP_IP> -mi <SIPP_IP> -ci <SIPP_IP>
-r 20 -d 5000 -l 200 -m 1000 -sf uac.xml
Calls : 1000
Successful calls : 1000
Idle CPU during tests : ~0-30 % (0 during the generation of new calls,
30 during the -l limit imposed by the test)
THIS IS WHAT MAKES ME WONDER :
The distribution of the durations (billsec - not complete durations) :
183 calls with 10 secs billed duration
110 calls with 11 secs billed duration
238 calls with 12 secs billed duration
447 calls with 13 secs billed duration
22 calls with 14 secs billed duration
The sipp scenario is simple "hangup the phone after 10 secs". So, why am
I seeing these? Of course that has something to do with the stress the
machine
has been put through during the second test. But I can see it happening
to less stressful conditions (i.e. 15 calls per second) to a smaller extend.
I captured one of these calls and verified that when the sipp client
hangs up exactly 10 secs after the call start, FS receives the BYE
and replies with 200 OK. BUT it does not hang the second leg in a timely
manner i.e. it sends a BYE to the sipp server side 1-4 seconds
AFTER that. That explains the 11, 12, 13, 14 secs durations seen on the
second test. What is more interesting is that I would expect to see in
the CDRs the first and second leg to have different durations (since
the a leg BYE was received and aknowledged by FS in the correct time)
i.e. 10 and 14 secs accordingly. But what I get is the same duration for
both legs (14 secs).
This in my opinion is very dangerous on production environments (you get
charged by your provider more seconds that you charge your clients - or
- you falsely charge your clients with bigger durations although they
hunged up corectly (and you acknowledged it)).
NOTE No 1 : All the performance recommendations found in the wiki has
been applied. In fact only the essential modules that could make this
scenario work
were loaded.
NOTE No 2 : I tried using asterisk (as a point of reference - don't get
me wrong - I am not trying to start a flame war here). And it succeeded
doing on the same hardware 60 calls/sec with a channel limit of 400
sim. calls using only 50% of the cpu (maximum). No under any
circumstances I have seen the behavior above (this inability to hang
call legs in a timely manner). Even when I pushed asterisk to the limits
(80 calls per second 600 max call limit) and it started failing on some
calls it never failed to hangup the calls for both legs on exactly 10 secs.
NOTE No 3 : As you can tell I was using a very small machine for my
tests. When I moved the same tests to larger installations (Quad Core
Opterons and Xeons) I got proportional results to the above.
NOTE No 4 : The tests were performed in a LAN environment and since
there was no RTP involved I think there were no bandwidth issues there.
NOTE No 5 : The tests were performed using numerous SVN versions (latest
: 13610), the stable version and the 1.0.4pre8 version.
NOTE No 6 : Using the -hp switch made no noticeable change in behavior.
I am not trying to complain for FS's performance (far from it). I am
just somewhat disappointed seeing it performing in such a strange manner
when under stress. I would prefer a design that drops the calls after a
certain threshold than a design that incorrectly handles them all (I am
aware of the max sessions per second in switch.conf.cml - but I am
starting to see this behavior even with the cpu idling at about 80%). I
don't know if anyone else had the same experience when testing
Freeswitch. I can happily supply with all the test details (config
files, captures etc) to all interested parties.
--
-------------------------------------------
Apostolos Pantsiopoulos
Kinetix Tele.com R & D
email: regs at kinetix.gr
-------------------------------------------
More information about the FreeSWITCH-users
mailing list