[Freeswitch-users] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.

Anthony Minessale anthony.minessale at gmail.com
Fri Mar 8 22:19:33 MSK 2013


FS runs itself as RT priority for the threads that are critical to its
operation.

We do not typically entertain load testing questions here because
its subjective to the environment and requires a lot of knowledge and
computer tuning skills and we have lost a lot of time over the
years addressing this kind of topic.

Typically when you find the limits your machine can handle, its best to set
the params designed to protect the process from getting overloaded.  like
the min-cpu-idle and max-sessions etc.

The call flow you are using and 100 other variables factor into what you
can get as a max.  We do not endorse or quote performance numbers.

One hint, try a bigger ptime such as 40ms during testing to reduce the load
on the scheduler.  Another is get a nice 12 core box if you want insane
call volume.  The more cpu, the more concurrent context switches you can
endure.

What you are experiencing is just the tip of the iceberg on the realm
of performance of user-space low latency media.  There is a wealth of
information collected on this topic and you will see a lot of the
challenges as you move forward.




On Fri, Mar 8, 2013 at 12:15 PM, bratner bratner <ratner2 at gmail.com> wrote:

> Here is sipp output and additional numbers for a test I ran with -nosql
> param.
>
> The test ran 180CPS for ~3500seconds and the rest with 210cps.
>
> Trouble (as in higher system cpu% ) started to appear around 8591seconds
> into the test.
> As you can see below the problem started just before 9124sec into the
> test  210cps 5sec calls
> should not give you a lot more then 1050 concurrent calls.
>
> ------------------------------ Scenario Screen -------- [1-9]: Change
> Screen --
>   Call-rate(length)   Port   Total-time  Total-calls  Remote-host
> 210.0(5000 ms)/1.000s   5061    9157.32 s      1834024  192.96.201.164
> :5060(UDP)
>
>   0 new calls during 0.000 s period      0 ms scheduler resolution
>   0 calls (limit 2000)                   Peak was 2000 calls, after 9124 s
>   0 Running, 4640 Paused, 0 Woken up
>   20 dead call msg (discarded)           0 out-of-call msg
> (discarded)
>   1 open sockets
>
>                                  Messages  Retrans   Timeout
> Unexpected-Msg
>       INVITE ---------->         1834024   74        0
>          100 <----------         1834024   0         0         0
>          180 <----------         1834024   0         0         0
>          183 <----------         0         0         0         0
>          500 <----------         0         0         0         0
>          502 <----------         0         0         0         0
>          503 <----------         0         0         0         0
>          408 <----------         0         0         0         0
>          480 <----------         0         0         0         0
>          200 <----------  E-RTD1 1834024   81        0         0
>
>          ACK ---------->         1834024   81
>        Pause [   5000ms]         1834024                       0
>          BYE ---------->         1834024   7646      0
>          503 <----------         0         0         0         0
>          200 <----------         1834024   0         0         0
>
> ------------------------------ Test Terminated
> --------------------------------
>
>
> ----------------------------- Statistics Screen ------- [1-9]: Change
> Screen --
>   Start Time             | 2013-03-08    15:22:18:204
> 1362756138.204833
>   Last Reset Time        | 2013-03-08    17:54:55:535
> 1362765295.535214
>   Current Time           | 2013-03-08    17:54:55:535
> 1362765295.535437
>
> -------------------------+---------------------------+--------------------------
>   Counter Name           | Periodic value            | Cumulative value
>
> -------------------------+---------------------------+--------------------------
>   Elapsed Time           | 00:00:00:000              |
> 02:32:37:330
>   Call Rate              |    0.000 cps              |  200.279
> cps
>
> -------------------------+---------------------------+--------------------------
>   Incoming call created  |        0                  |
> 0
>   OutGoing call created  |        0                  |
> 1834024
>   Total Call created     |                           |
> 1834024
>   Current Call           |        0
> |
>
> -------------------------+---------------------------+--------------------------
>   Successful call        |        0                  |
> 1834024
>   Failed call            |        0                  |
> 0
>
> -------------------------+---------------------------+--------------------------
>   Response Time 1        | 00:00:00:000              |
> 00:00:00:149
>   Call Length            | 00:00:00:000              |
> 00:00:05:158
> ------------------------------ Test Terminated
> --------------------------------
>
>
> After stopping the load FS still hogs 22.1% of memory.
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 15995 freeswit  -2 -10 4677m 873m 5028 S    0 22.1 755:28.65
> freeswitch
>
>
>
> The symptoms of the crash are the same, just now with higher CPS and takes
> more time (more calls ) before crashing.
>
> I will appreciate any suggestion.
>
> Regards,
> Boris Ratner.
>
>
>
> On Fri, Mar 8, 2013 at 6:22 PM, bratner bratner <ratner2 at gmail.com> wrote:
>
>> The original test was done on git master at the date mentioned. The
>> sqlite core.db file was on /run/shm which is a tmpfs on unbuntu 12.04.
>> I will be recompiling from git master and test running with -nosql.
>>
>> Testing my existing setup with -nosql seems more stable now running at
>> 210CPS for some time (500k calls already passed) with ~35% idle cpu.
>> But the free mem is slowly going down. I will let it run untill the
>> kernel will kill it to see how many calls it can handle.
>>
>> During my tests i did not run FS with RT priority but according to htop
>> some of the threads are scheduled as RT.
>> My setup is doing bypass-media , thus FS handling only call establishment
>> and teardown on both legs.
>>
>> cat /proc/<FS pid>/status
>>
>> Name:   freeswitch
>> State:  S (sleeping)
>> Tgid:   15995
>> Pid:    15995
>> PPid:   1
>> TracerPid:      0
>> Uid:    999     999     999     999
>> Gid:    999     999     999     999
>> FDSize: 64
>> Groups:
>> VmPeak:  5002808 kB
>> VmSize:  5002088 kB
>> VmLck:         0 kB
>> VmPin:         0 kB
>> VmHWM:    625900 kB
>> VmRSS:    624156 kB  <-- this is going up
>> VmData:  4855788 kB
>> VmStk:       136 kB
>> VmExe:        20 kB
>> VmLib:     18288 kB
>> VmPTE:      2352 kB
>> VmSwap:        0 kB
>> Threads:        1866
>> SigQ:   0/18446744073709551615
>> SigPnd: 0000000000000000
>> ShdPnd: 0000000000000000
>> SigBlk: 0000000000000000
>> SigIgn: 0000000010003006
>> SigCgt: 0000000180014209
>> CapInh: 0000000000000000
>> CapPrm: 0000000000000000
>> CapEff: 0000000000000000
>> CapBnd: ffffffffffffffff
>> Cpus_allowed:   ffffff
>> Cpus_allowed_list:      0-23
>> Mems_allowed:   00000000,00000003
>> Mems_allowed_list:      0-1
>> voluntary_ctxt_switches:        1803
>> nonvoluntary_ctxt_switches:     23
>>
>>
>> output of 'top -H' at 180CPS
>>
>>
>> top - 15:27:00 up 2 days,  5:32,  5 users,  load average: 8.19, 91.07,
>> 65.03
>> Tasks: 2066 total,   3 running, 2063 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 50.1%us,  3.9%sy,  0.0%ni, 45.9%id,  0.0%wa,  0.0%hi,  0.2%si,
>> 0.0%st
>> Mem:   4038512k total,  2282260k used,  1756252k free,   114112k buffers
>> Swap:        0k total,        0k used,        0k free,  1165868k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>> COMMAND
>>
>> 16000 freeswit  RT -10 4885m 594m 4964 R   69 15.1   3:10.26
>> freeswitch
>>
>> 16009 freeswit  RT -10 4885m 594m 4964 S   33 15.1   1:26.20
>> freeswitch
>>
>> 16008 freeswit  RT -10 4885m 594m 4964 S   28 15.1   1:17.30
>> freeswitch
>>
>> 16007 freeswit  RT -10 4885m 594m 4964 S    4 15.1   0:10.80
>> freeswitch
>>
>> 16004 freeswit  RT -10 4885m 594m 4964 S    2 15.1   0:06.63
>> freeswitch
>>
>> 19171 root      20   0 18988 2948  944 R    2  0.1   0:00.64
>> top
>>
>> 18735 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.29
>> freeswitch
>>
>> 16003 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:01.61
>> freeswitch
>>
>> 16690 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>> freeswitch
>>
>> 16730 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>> freeswitch
>>
>> 16750 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.45
>> freeswitch
>>
>> 16764 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.44
>> freeswitch
>>
>> <more of the above>
>> ....
>> ....
>>
>>
>> Thanks to all of you ,
>> Boris Ratner.
>>
>> On Fri, Mar 8, 2013 at 4:22 AM, Dmitry Lysenko <dvl36.ripe.nick at gmail.com
>> > wrote:
>>
>>> I can't reproduce such cps load on my ARMv5TE system. )
>>> bratner, please give us 'top -H'. I guess freeswitch running at realtime
>>> priority.
>>>
>>>
>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>
>>>>  Sqlite is probably getting hammered... Trust me... Mount the fs db
>>>> dir as tmpfs or use the –nosql flag when starting freeswitch
>>>>
>>>> I routinely run dialer traffic at much higher CPS then that
>>>>
>>>>
>>>>
>>>> On 3/7/13 7:58 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com> wrote:
>>>>
>>>> bi, bo and wa field is low, so it seems that is not disk subsystem.
>>>>
>>>>
>>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>>
>>>> You are probably hammering the disk subsystem... Keep in mind that FS
>>>> uses multiple sqlite databases by default... Mount the fs db dir as tmpfs
>>>> and try again
>>>>
>>>>
>>>>
>>>> On 3/7/13 7:35 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com <
>>>> http://dvl36.ripe.nick@gmail.com> > wrote:
>>>>
>>>> Hm... But what about huge interrupt and context switching  number?
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> _________________________________________________________________________
>>>> Professional FreeSWITCH Consulting Services:
>>>> consulting at freeswitch.org
>>>> http://www.freeswitchsolutions.com
>>>>
>>>> 
>>>> 
>>>>
>>>> Official FreeSWITCH Sites
>>>> http://www.freeswitch.org
>>>> http://wiki.freeswitch.org
>>>> http://www.cluecon.com
>>>>
>>>> FreeSWITCH-users mailing list
>>>> FreeSWITCH-users at lists.freeswitch.org
>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>> UNSUBSCRIBE:
>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>> http://www.freeswitch.org
>>>>
>>>>
>>>> --
>>>> Ken
>>>> *http://www.FreeSWITCH.org
>>>> http://www.ClueCon.com
>>>> http://www.OSTAG.org
>>>> *irc.freenode.net #freeswitch
>>>>
>>>>
>>>> _________________________________________________________________________
>>>> Professional FreeSWITCH Consulting Services:
>>>> consulting at freeswitch.org
>>>> http://www.freeswitchsolutions.com
>>>>
>>>> 
>>>> 
>>>>
>>>> Official FreeSWITCH Sites
>>>> http://www.freeswitch.org
>>>> http://wiki.freeswitch.org
>>>> http://www.cluecon.com
>>>>
>>>> FreeSWITCH-users mailing list
>>>> FreeSWITCH-users at lists.freeswitch.org
>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>> UNSUBSCRIBE:
>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>> http://www.freeswitch.org
>>>>
>>>>
>>>
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>> 
>>> 
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>>
>>
>
> _________________________________________________________________________
> Professional FreeSWITCH Consulting Services:
> consulting at freeswitch.org
> http://www.freeswitchsolutions.com
>
> 
> 
>
> Official FreeSWITCH Sites
> http://www.freeswitch.org
> http://wiki.freeswitch.org
> http://www.cluecon.com
>
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>
>


-- 
Anthony Minessale II

FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/
Twitter: http://twitter.com/FreeSWITCH_wire

AIM: anthm
MSN:anthony_minessale at hotmail.com
GTALK/JABBER/PAYPAL:anthony.minessale at gmail.com
IRC: irc.freenode.net #freeswitch

FreeSWITCH Developer Conference
sip:888 at conference.freeswitch.org
googletalk:conf+888 at conference.freeswitch.org
pstn:+19193869900
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130308/328f9da3/attachment-0001.html 


Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users mailing list