[Freeswitch-users] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.

Dmitry Lysenko dvl36.ripe.nick at gmail.com
Fri Mar 8 22:32:42 MSK 2013


Please try to reproduce with  -np switch.
Thanks.


2013/3/8 bratner bratner <ratner2 at gmail.com>

> Here is sipp output and additional numbers for a test I ran with -nosql
> param.
>
> The test ran 180CPS for ~3500seconds and the rest with 210cps.
>
> Trouble (as in higher system cpu% ) started to appear around 8591seconds
> into the test.
> As you can see below the problem started just before 9124sec into the
> test  210cps 5sec calls
> should not give you a lot more then 1050 concurrent calls.
>
> ------------------------------ Scenario Screen -------- [1-9]: Change
> Screen --
>   Call-rate(length)   Port   Total-time  Total-calls  Remote-host
> 210.0(5000 ms)/1.000s   5061    9157.32 s      1834024  192.96.201.164:5060
> (UDP)
>
>   0 new calls during 0.000 s period      0 ms scheduler resolution
>   0 calls (limit 2000)                   Peak was 2000 calls, after 9124 s
>   0 Running, 4640 Paused, 0 Woken up
>   20 dead call msg (discarded)           0 out-of-call msg
> (discarded)
>   1 open sockets
>
>                                  Messages  Retrans   Timeout
> Unexpected-Msg
>       INVITE ---------->         1834024   74        0
>          100 <----------         1834024   0         0         0
>          180 <----------         1834024   0         0         0
>          183 <----------         0         0         0         0
>          500 <----------         0         0         0         0
>          502 <----------         0         0         0         0
>          503 <----------         0         0         0         0
>          408 <----------         0         0         0         0
>          480 <----------         0         0         0         0
>          200 <----------  E-RTD1 1834024   81        0         0
>
>          ACK ---------->         1834024   81
>        Pause [   5000ms]         1834024                       0
>          BYE ---------->         1834024   7646      0
>          503 <----------         0         0         0         0
>          200 <----------         1834024   0         0         0
>
> ------------------------------ Test Terminated
> --------------------------------
>
>
> ----------------------------- Statistics Screen ------- [1-9]: Change
> Screen --
>   Start Time             | 2013-03-08    15:22:18:204
> 1362756138.204833
>   Last Reset Time        | 2013-03-08    17:54:55:535
> 1362765295.535214
>   Current Time           | 2013-03-08    17:54:55:535
> 1362765295.535437
>
> -------------------------+---------------------------+--------------------------
>   Counter Name           | Periodic value            | Cumulative value
>
> -------------------------+---------------------------+--------------------------
>   Elapsed Time           | 00:00:00:000              |
> 02:32:37:330
>   Call Rate              |    0.000 cps              |  200.279
> cps
>
> -------------------------+---------------------------+--------------------------
>   Incoming call created  |        0                  |
> 0
>   OutGoing call created  |        0                  |
> 1834024
>   Total Call created     |                           |
> 1834024
>   Current Call           |        0
> |
>
> -------------------------+---------------------------+--------------------------
>   Successful call        |        0                  |
> 1834024
>   Failed call            |        0                  |
> 0
>
> -------------------------+---------------------------+--------------------------
>   Response Time 1        | 00:00:00:000              |
> 00:00:00:149
>   Call Length            | 00:00:00:000              |
> 00:00:05:158
> ------------------------------ Test Terminated
> --------------------------------
>
>
> After stopping the load FS still hogs 22.1% of memory.
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 15995 freeswit  -2 -10 4677m 873m 5028 S    0 22.1 755:28.65
> freeswitch
>
>
>
> The symptoms of the crash are the same, just now with higher CPS and takes
> more time (more calls ) before crashing.
>
> I will appreciate any suggestion.
>
> Regards,
> Boris Ratner.
>
>
>
> On Fri, Mar 8, 2013 at 6:22 PM, bratner bratner <ratner2 at gmail.com> wrote:
>
>> The original test was done on git master at the date mentioned. The
>> sqlite core.db file was on /run/shm which is a tmpfs on unbuntu 12.04.
>> I will be recompiling from git master and test running with -nosql.
>>
>> Testing my existing setup with -nosql seems more stable now running at
>> 210CPS for some time (500k calls already passed) with ~35% idle cpu.
>> But the free mem is slowly going down. I will let it run untill the
>> kernel will kill it to see how many calls it can handle.
>>
>> During my tests i did not run FS with RT priority but according to htop
>> some of the threads are scheduled as RT.
>> My setup is doing bypass-media , thus FS handling only call establishment
>> and teardown on both legs.
>>
>> cat /proc/<FS pid>/status
>>
>> Name:   freeswitch
>> State:  S (sleeping)
>> Tgid:   15995
>> Pid:    15995
>> PPid:   1
>> TracerPid:      0
>> Uid:    999     999     999     999
>> Gid:    999     999     999     999
>> FDSize: 64
>> Groups:
>> VmPeak:  5002808 kB
>> VmSize:  5002088 kB
>> VmLck:         0 kB
>> VmPin:         0 kB
>> VmHWM:    625900 kB
>> VmRSS:    624156 kB  <-- this is going up
>> VmData:  4855788 kB
>> VmStk:       136 kB
>> VmExe:        20 kB
>> VmLib:     18288 kB
>> VmPTE:      2352 kB
>> VmSwap:        0 kB
>> Threads:        1866
>> SigQ:   0/18446744073709551615
>> SigPnd: 0000000000000000
>> ShdPnd: 0000000000000000
>> SigBlk: 0000000000000000
>> SigIgn: 0000000010003006
>> SigCgt: 0000000180014209
>> CapInh: 0000000000000000
>> CapPrm: 0000000000000000
>> CapEff: 0000000000000000
>> CapBnd: ffffffffffffffff
>> Cpus_allowed:   ffffff
>> Cpus_allowed_list:      0-23
>> Mems_allowed:   00000000,00000003
>> Mems_allowed_list:      0-1
>> voluntary_ctxt_switches:        1803
>> nonvoluntary_ctxt_switches:     23
>>
>>
>> output of 'top -H' at 180CPS
>>
>>
>> top - 15:27:00 up 2 days,  5:32,  5 users,  load average: 8.19, 91.07,
>> 65.03
>> Tasks: 2066 total,   3 running, 2063 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 50.1%us,  3.9%sy,  0.0%ni, 45.9%id,  0.0%wa,  0.0%hi,  0.2%si,
>> 0.0%st
>> Mem:   4038512k total,  2282260k used,  1756252k free,   114112k buffers
>> Swap:        0k total,        0k used,        0k free,  1165868k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>> COMMAND
>>
>> 16000 freeswit  RT -10 4885m 594m 4964 R   69 15.1   3:10.26
>> freeswitch
>>
>> 16009 freeswit  RT -10 4885m 594m 4964 S   33 15.1   1:26.20
>> freeswitch
>>
>> 16008 freeswit  RT -10 4885m 594m 4964 S   28 15.1   1:17.30
>> freeswitch
>>
>> 16007 freeswit  RT -10 4885m 594m 4964 S    4 15.1   0:10.80
>> freeswitch
>>
>> 16004 freeswit  RT -10 4885m 594m 4964 S    2 15.1   0:06.63
>> freeswitch
>>
>> 19171 root      20   0 18988 2948  944 R    2  0.1   0:00.64
>> top
>>
>> 18735 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.29
>> freeswitch
>>
>> 16003 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:01.61
>> freeswitch
>>
>> 16690 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>> freeswitch
>>
>> 16730 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>> freeswitch
>>
>> 16750 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.45
>> freeswitch
>>
>> 16764 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.44
>> freeswitch
>>
>> <more of the above>
>> ....
>> ....
>>
>>
>> Thanks to all of you ,
>> Boris Ratner.
>>
>> On Fri, Mar 8, 2013 at 4:22 AM, Dmitry Lysenko <dvl36.ripe.nick at gmail.com
>> > wrote:
>>
>>> I can't reproduce such cps load on my ARMv5TE system. )
>>> bratner, please give us 'top -H'. I guess freeswitch running at realtime
>>> priority.
>>>
>>>
>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>
>>>>  Sqlite is probably getting hammered... Trust me... Mount the fs db
>>>> dir as tmpfs or use the –nosql flag when starting freeswitch
>>>>
>>>> I routinely run dialer traffic at much higher CPS then that
>>>>
>>>>
>>>>
>>>> On 3/7/13 7:58 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com> wrote:
>>>>
>>>> bi, bo and wa field is low, so it seems that is not disk subsystem.
>>>>
>>>>
>>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>>
>>>> You are probably hammering the disk subsystem... Keep in mind that FS
>>>> uses multiple sqlite databases by default... Mount the fs db dir as tmpfs
>>>> and try again
>>>>
>>>>
>>>>
>>>> On 3/7/13 7:35 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com <
>>>> http://dvl36.ripe.nick@gmail.com> > wrote:
>>>>
>>>> Hm... But what about huge interrupt and context switching  number?
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> _________________________________________________________________________
>>>> Professional FreeSWITCH Consulting Services:
>>>> consulting at freeswitch.org
>>>> http://www.freeswitchsolutions.com
>>>>
>>>> 
>>>> 
>>>>
>>>> Official FreeSWITCH Sites
>>>> http://www.freeswitch.org
>>>> http://wiki.freeswitch.org
>>>> http://www.cluecon.com
>>>>
>>>> FreeSWITCH-users mailing list
>>>> FreeSWITCH-users at lists.freeswitch.org
>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>> UNSUBSCRIBE:
>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>> http://www.freeswitch.org
>>>>
>>>>
>>>> --
>>>> Ken
>>>> *http://www.FreeSWITCH.org
>>>> http://www.ClueCon.com
>>>> http://www.OSTAG.org
>>>> *irc.freenode.net #freeswitch
>>>>
>>>>
>>>> _________________________________________________________________________
>>>> Professional FreeSWITCH Consulting Services:
>>>> consulting at freeswitch.org
>>>> http://www.freeswitchsolutions.com
>>>>
>>>> 
>>>> 
>>>>
>>>> Official FreeSWITCH Sites
>>>> http://www.freeswitch.org
>>>> http://wiki.freeswitch.org
>>>> http://www.cluecon.com
>>>>
>>>> FreeSWITCH-users mailing list
>>>> FreeSWITCH-users at lists.freeswitch.org
>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>> UNSUBSCRIBE:
>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>> http://www.freeswitch.org
>>>>
>>>>
>>>
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>> 
>>> 
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>>
>>
>
> _________________________________________________________________________
> Professional FreeSWITCH Consulting Services:
> consulting at freeswitch.org
> http://www.freeswitchsolutions.com
>
> 
> 
>
> Official FreeSWITCH Sites
> http://www.freeswitch.org
> http://wiki.freeswitch.org
> http://www.cluecon.com
>
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130308/3af5a868/attachment-0001.html 


Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users mailing list