[Freeswitch-users] [SOLVED] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.

Dmitry Lysenko dvl36.ripe.nick at gmail.com
Sun Mar 10 17:03:45 MSK 2013


It seems that we found bug (racing condition?) in sofia lib code. It
related to setting  of highest possible (99) thread realtime priority in
libs/sofia-sip/libsofia-sip-ua/su/su_pthread_port.c This issue can happen
only when freeswitch is running with realtime priority (defaults in
multi-cpu configuration in the latest git).
If someone interesting, give me know.
Thanks.


2013/3/10 bratner bratner <ratner2 at gmail.com>

> List, Dmitry
>
> Everything looks fine @180CPS and I can conclude that running FS with -np
> together with switching the kernel to ubuntu's low latency image
> solved the problem. The mem leak remains but this I can handle by
> selective restarts.
>
> Thanks!
> Boris Ratner
>
>
>
> On Sun, Mar 10, 2013 at 12:35 PM, bratner bratner <ratner2 at gmail.com>wrote:
>
>> Dmitry, Hi!
>>
>> Running with -np at 180CPS for 3000sec now (over 500k calls). I already
>> passed by far the amount of calls i was able to do at this CPS previously.
>> I can see that all FS threads are the same priority and there are no RT
>> threads.
>> Context switches per ser are rising slowly. If i can make a million calls
>> it is good enough for me.
>> There is a small mem leak but that is not what have me worried because i
>> can monitor it and restart FS when necessary.
>> In my previous tests when CS reached closer to 60k the spiral down began.
>>
>> You think that FS RT threads slowly starve another important task?
>>
>> Holding my fingers crossed.
>>
>> Thanks!
>> Boris Ratner.
>>
>>
>>
>> On Sun, Mar 10, 2013 at 2:02 AM, Dmitry Lysenko <
>> dvl36.ripe.nick at gmail.com> wrote:
>>
>>> Boris, did you try to test load forcing freeswitch to run with normal
>>> priority? (-np)
>>> It seems that I have workaround, but don't sure exactly that your cpu
>>> load issue has the same root as mine. My system setup is uncommon
>>> (arm,128mb of RAM,RT kernel,mod_gsmopen), so I can't test it myself.
>>>
>>>
>>> 2013/3/10 bratner bratner <ratner2 at gmail.com>
>>>
>>>> List, Steve
>>>>
>>>> I will clarify what i'm asking here before I take Anothny's suggestion
>>>> and join a "computer tuning" club as a way to "move forward".
>>>> http://media.bestofmicro.com/gerbilpc-tuning-pc,S-L-252453-13.jpg
>>>>
>>>> What is there to read on this subject? Links, textbook names -
>>>> everything is appreciated.
>>>> What are the tools that show useful data and what i can do with FS to
>>>> make the work easier? Compile with some flags to get more info on running
>>>> threads?
>>>>
>>>> Thanks,
>>>> Boris Ratner.
>>>>
>>>>
>>>> On Sat, Mar 9, 2013 at 12:51 AM, Steven Ayre <steveayre at gmail.com>wrote:
>>>>
>>>>> After stopping the load FS still hogs 22.1% of memory.
>>>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>>
>>>>>>
>>>>>> 15995 freeswit  -2 -10 4677m 873m 5028 S    0 22.1 755:28.65
>>>>>> freeswitch
>>>>>
>>>>>
>>>>> Until you test with the version you're building from master I would
>>>>> ignore the memory usage since you're running a version with known memory
>>>>> leaks.
>>>>>
>>>>> -Steve
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 8 March 2013 18:15, bratner bratner <ratner2 at gmail.com> wrote:
>>>>> > Here is sipp output and additional numbers for a test I ran with
>>>>> -nosql
>>>>> > param.
>>>>> >
>>>>> > The test ran 180CPS for ~3500seconds and the rest with 210cps.
>>>>> >
>>>>> > Trouble (as in higher system cpu% ) started to appear around
>>>>> 8591seconds
>>>>> > into the test.
>>>>> > As you can see below the problem started just before 9124sec into
>>>>> the test
>>>>> > 210cps 5sec calls
>>>>> > should not give you a lot more then 1050 concurrent calls.
>>>>> >
>>>>> > ------------------------------ Scenario Screen -------- [1-9]:
>>>>> Change Screen
>>>>> > --
>>>>> >   Call-rate(length)   Port   Total-time  Total-calls  Remote-host
>>>>> > 210.0(5000 ms)/1.000s   5061    9157.32 s      1834024
>>>>> > 192.96.201.164:5060(UDP)
>>>>> >
>>>>> >   0 new calls during 0.000 s period      0 ms scheduler resolution
>>>>> >   0 calls (limit 2000)                   Peak was 2000 calls, after
>>>>> 9124 s
>>>>> >   0 Running, 4640 Paused, 0 Woken up
>>>>> >   20 dead call msg (discarded)           0 out-of-call msg
>>>>> (discarded)
>>>>> >   1 open sockets
>>>>> >
>>>>> >                                  Messages  Retrans   Timeout
>>>>> > Unexpected-Msg
>>>>> >       INVITE ---------->         1834024   74        0
>>>>>
>>>>> >          100 <----------         1834024   0         0         0
>>>>>
>>>>> >          180 <----------         1834024   0         0         0
>>>>>
>>>>> >          183 <----------         0         0         0         0
>>>>>
>>>>> >          500 <----------         0         0         0         0
>>>>>
>>>>> >          502 <----------         0         0         0         0
>>>>>
>>>>> >          503 <----------         0         0         0         0
>>>>>
>>>>> >          408 <----------         0         0         0         0
>>>>>
>>>>> >          480 <----------         0         0         0         0
>>>>>
>>>>> >          200 <----------  E-RTD1 1834024   81        0         0
>>>>>
>>>>> >
>>>>> >          ACK ---------->         1834024   81
>>>>>
>>>>> >        Pause [   5000ms]         1834024                       0
>>>>>
>>>>> >          BYE ---------->         1834024   7646      0
>>>>>
>>>>> >          503 <----------         0         0         0         0
>>>>>
>>>>> >          200 <----------         1834024   0         0         0
>>>>>
>>>>> >
>>>>> > ------------------------------ Test Terminated
>>>>> > --------------------------------
>>>>> >
>>>>> >
>>>>> > ----------------------------- Statistics Screen ------- [1-9]:
>>>>> Change Screen
>>>>> > --
>>>>> >   Start Time             | 2013-03-08    15:22:18:204
>>>>>  1362756138.204833
>>>>> >   Last Reset Time        | 2013-03-08    17:54:55:535
>>>>>  1362765295.535214
>>>>> >   Current Time           | 2013-03-08    17:54:55:535
>>>>>  1362765295.535437
>>>>> >
>>>>> -------------------------+---------------------------+--------------------------
>>>>> >   Counter Name           | Periodic value            | Cumulative
>>>>> value
>>>>> >
>>>>> -------------------------+---------------------------+--------------------------
>>>>> >   Elapsed Time           | 00:00:00:000              | 02:32:37:330
>>>>>
>>>>> >   Call Rate              |    0.000 cps              |  200.279 cps
>>>>>
>>>>> >
>>>>> -------------------------+---------------------------+--------------------------
>>>>> >   Incoming call created  |        0                  |        0
>>>>>
>>>>> >   OutGoing call created  |        0                  |  1834024
>>>>>
>>>>> >   Total Call created     |                           |  1834024
>>>>>
>>>>> >   Current Call           |        0                  |
>>>>>
>>>>> >
>>>>> -------------------------+---------------------------+--------------------------
>>>>> >   Successful call        |        0                  |  1834024
>>>>>
>>>>> >   Failed call            |        0                  |        0
>>>>>
>>>>> >
>>>>> -------------------------+---------------------------+--------------------------
>>>>> >   Response Time 1        | 00:00:00:000              | 00:00:00:149
>>>>>
>>>>> >   Call Length            | 00:00:00:000              | 00:00:05:158
>>>>>
>>>>> > ------------------------------ Test Terminated
>>>>> > --------------------------------
>>>>> >
>>>>> >
>>>>> > After stopping the load FS still hogs 22.1% of memory.
>>>>> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>
>>>>>
>>>>> > 15995 freeswit  -2 -10 4677m 873m 5028 S    0 22.1 755:28.65
>>>>> freeswitch
>>>>>
>>>>> >
>>>>> >
>>>>> > The symptoms of the crash are the same, just now with higher CPS and
>>>>> takes
>>>>> > more time (more calls ) before crashing.
>>>>> >
>>>>> > I will appreciate any suggestion.
>>>>> >
>>>>> > Regards,
>>>>> > Boris Ratner.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Mar 8, 2013 at 6:22 PM, bratner bratner <ratner2 at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> The original test was done on git master at the date mentioned. The
>>>>> sqlite
>>>>> >> core.db file was on /run/shm which is a tmpfs on unbuntu 12.04.
>>>>> >> I will be recompiling from git master and test running with -nosql.
>>>>> >>
>>>>> >> Testing my existing setup with -nosql seems more stable now running
>>>>> at
>>>>> >> 210CPS for some time (500k calls already passed) with ~35% idle cpu.
>>>>> >> But the free mem is slowly going down. I will let it run untill the
>>>>> kernel
>>>>> >> will kill it to see how many calls it can handle.
>>>>> >>
>>>>> >> During my tests i did not run FS with RT priority but according to
>>>>> htop
>>>>> >> some of the threads are scheduled as RT.
>>>>> >> My setup is doing bypass-media , thus FS handling only call
>>>>> establishment
>>>>> >> and teardown on both legs.
>>>>> >>
>>>>> >> cat /proc/<FS pid>/status
>>>>> >>
>>>>> >> Name:   freeswitch
>>>>> >> State:  S (sleeping)
>>>>> >> Tgid:   15995
>>>>> >> Pid:    15995
>>>>> >> PPid:   1
>>>>> >> TracerPid:      0
>>>>> >> Uid:    999     999     999     999
>>>>> >> Gid:    999     999     999     999
>>>>> >> FDSize: 64
>>>>> >> Groups:
>>>>> >> VmPeak:  5002808 kB
>>>>> >> VmSize:  5002088 kB
>>>>> >> VmLck:         0 kB
>>>>> >> VmPin:         0 kB
>>>>> >> VmHWM:    625900 kB
>>>>> >> VmRSS:    624156 kB  <-- this is going up
>>>>> >> VmData:  4855788 kB
>>>>> >> VmStk:       136 kB
>>>>> >> VmExe:        20 kB
>>>>> >> VmLib:     18288 kB
>>>>> >> VmPTE:      2352 kB
>>>>> >> VmSwap:        0 kB
>>>>> >> Threads:        1866
>>>>> >> SigQ:   0/18446744073709551615
>>>>> >> SigPnd: 0000000000000000
>>>>> >> ShdPnd: 0000000000000000
>>>>> >> SigBlk: 0000000000000000
>>>>> >> SigIgn: 0000000010003006
>>>>> >> SigCgt: 0000000180014209
>>>>> >> CapInh: 0000000000000000
>>>>> >> CapPrm: 0000000000000000
>>>>> >> CapEff: 0000000000000000
>>>>> >> CapBnd: ffffffffffffffff
>>>>> >> Cpus_allowed:   ffffff
>>>>> >> Cpus_allowed_list:      0-23
>>>>> >> Mems_allowed:   00000000,00000003
>>>>> >> Mems_allowed_list:      0-1
>>>>> >> voluntary_ctxt_switches:        1803
>>>>> >> nonvoluntary_ctxt_switches:     23
>>>>> >>
>>>>> >>
>>>>> >> output of 'top -H' at 180CPS
>>>>> >>
>>>>> >>
>>>>> >> top - 15:27:00 up 2 days,  5:32,  5 users,  load average: 8.19,
>>>>> 91.07,
>>>>> >> 65.03
>>>>> >> Tasks: 2066 total,   3 running, 2063 sleeping,   0 stopped,   0
>>>>> zombie
>>>>> >> Cpu(s): 50.1%us,  3.9%sy,  0.0%ni, 45.9%id,  0.0%wa,  0.0%hi,
>>>>>  0.2%si,
>>>>> >> 0.0%st
>>>>> >> Mem:   4038512k total,  2282260k used,  1756252k free,   114112k
>>>>> buffers
>>>>> >> Swap:        0k total,        0k used,        0k free,  1165868k
>>>>> cached
>>>>> >>
>>>>> >>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>  COMMAND
>>>>>
>>>>> >> 16000 freeswit  RT -10 4885m 594m 4964 R   69 15.1   3:10.26
>>>>> freeswitch
>>>>>
>>>>> >> 16009 freeswit  RT -10 4885m 594m 4964 S   33 15.1   1:26.20
>>>>> freeswitch
>>>>>
>>>>> >> 16008 freeswit  RT -10 4885m 594m 4964 S   28 15.1   1:17.30
>>>>> freeswitch
>>>>>
>>>>> >> 16007 freeswit  RT -10 4885m 594m 4964 S    4 15.1   0:10.80
>>>>> freeswitch
>>>>>
>>>>> >> 16004 freeswit  RT -10 4885m 594m 4964 S    2 15.1   0:06.63
>>>>> freeswitch
>>>>>
>>>>> >> 19171 root      20   0 18988 2948  944 R    2  0.1   0:00.64 top
>>>>>
>>>>>
>>>>> >> 18735 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.29
>>>>> freeswitch
>>>>>
>>>>> >> 16003 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:01.61
>>>>> freeswitch
>>>>>
>>>>> >> 16690 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>>>>> freeswitch
>>>>>
>>>>> >> 16730 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.42
>>>>> freeswitch
>>>>>
>>>>> >> 16750 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.45
>>>>> freeswitch
>>>>>
>>>>> >> 16764 freeswit  -2 -10 4885m 594m 4964 S    1 15.1   0:00.44
>>>>> freeswitch
>>>>>
>>>>> >> <more of the above>
>>>>> >> ....
>>>>> >> ....
>>>>> >>
>>>>> >>
>>>>> >> Thanks to all of you ,
>>>>> >> Boris Ratner.
>>>>> >>
>>>>> >> On Fri, Mar 8, 2013 at 4:22 AM, Dmitry Lysenko <
>>>>> dvl36.ripe.nick at gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> I can't reproduce such cps load on my ARMv5TE system. )
>>>>> >>> bratner, please give us 'top -H'. I guess freeswitch running at
>>>>> realtime
>>>>> >>> priority.
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>>> >>>>
>>>>> >>>> Sqlite is probably getting hammered... Trust me... Mount the fs
>>>>> db dir
>>>>> >>>> as tmpfs or use the –nosql flag when starting freeswitch
>>>>> >>>>
>>>>> >>>> I routinely run dialer traffic at much higher CPS then that
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On 3/7/13 7:58 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> bi, bo and wa field is low, so it seems that is not disk
>>>>> subsystem.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>>> >>>>
>>>>> >>>> You are probably hammering the disk subsystem... Keep in mind
>>>>> that FS
>>>>> >>>> uses multiple sqlite databases by default... Mount the fs db dir
>>>>> as tmpfs
>>>>> >>>> and try again
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On 3/7/13 7:35 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com
>>>>> >>>> <http://dvl36.ripe.nick@gmail.com> > wrote:
>>>>> >>>>
>>>>> >>>> Hm... But what about huge interrupt and context switching  number?
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> ________________________________
>>>>> >>>>
>>>>> >>>>
>>>>> _________________________________________________________________________
>>>>> >>>> Professional FreeSWITCH Consulting Services:
>>>>> >>>> consulting at freeswitch.org
>>>>> >>>> http://www.freeswitchsolutions.com
>>>>> >>>>
>>>>> >>>> 
>>>>> >>>> 
>>>>> >>>>
>>>>> >>>> Official FreeSWITCH Sites
>>>>> >>>> http://www.freeswitch.org
>>>>> >>>> http://wiki.freeswitch.org
>>>>> >>>> http://www.cluecon.com
>>>>> >>>>
>>>>> >>>> FreeSWITCH-users mailing list
>>>>> >>>> FreeSWITCH-users at lists.freeswitch.org
>>>>> >>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>>> >>>> UNSUBSCRIBE:
>>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>>> >>>> http://www.freeswitch.org
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Ken
>>>>> >>>> http://www.FreeSWITCH.org
>>>>> >>>> http://www.ClueCon.com
>>>>> >>>> http://www.OSTAG.org
>>>>> >>>> irc.freenode.net #freeswitch
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> _________________________________________________________________________
>>>>> >>>> Professional FreeSWITCH Consulting Services:
>>>>> >>>> consulting at freeswitch.org
>>>>> >>>> http://www.freeswitchsolutions.com
>>>>> >>>>
>>>>> >>>> 
>>>>> >>>> 
>>>>> >>>>
>>>>> >>>> Official FreeSWITCH Sites
>>>>> >>>> http://www.freeswitch.org
>>>>> >>>> http://wiki.freeswitch.org
>>>>> >>>> http://www.cluecon.com
>>>>> >>>>
>>>>> >>>> FreeSWITCH-users mailing list
>>>>> >>>> FreeSWITCH-users at lists.freeswitch.org
>>>>> >>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>>> >>>> UNSUBSCRIBE:
>>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>>> >>>> http://www.freeswitch.org
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> _________________________________________________________________________
>>>>> >>> Professional FreeSWITCH Consulting Services:
>>>>> >>> consulting at freeswitch.org
>>>>> >>> http://www.freeswitchsolutions.com
>>>>> >>>
>>>>> >>> 
>>>>> >>> 
>>>>> >>>
>>>>> >>> Official FreeSWITCH Sites
>>>>> >>> http://www.freeswitch.org
>>>>> >>> http://wiki.freeswitch.org
>>>>> >>> http://www.cluecon.com
>>>>> >>>
>>>>> >>> FreeSWITCH-users mailing list
>>>>> >>> FreeSWITCH-users at lists.freeswitch.org
>>>>> >>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>>> >>> UNSUBSCRIBE:
>>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>>> >>> http://www.freeswitch.org
>>>>> >>>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> _________________________________________________________________________
>>>>> > Professional FreeSWITCH Consulting Services:
>>>>> > consulting at freeswitch.org
>>>>> > http://www.freeswitchsolutions.com
>>>>> >
>>>>> > 
>>>>> > 
>>>>> >
>>>>> > Official FreeSWITCH Sites
>>>>> > http://www.freeswitch.org
>>>>> > http://wiki.freeswitch.org
>>>>> > http://www.cluecon.com
>>>>> >
>>>>> > FreeSWITCH-users mailing list
>>>>> > FreeSWITCH-users at lists.freeswitch.org
>>>>> > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>>> > UNSUBSCRIBE:
>>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>>> > http://www.freeswitch.org
>>>>> >
>>>>>
>>>>>
>>>>> _________________________________________________________________________
>>>>> Professional FreeSWITCH Consulting Services:
>>>>> consulting at freeswitch.org
>>>>> http://www.freeswitchsolutions.com
>>>>>
>>>>> 
>>>>> 
>>>>>
>>>>> Official FreeSWITCH Sites
>>>>> http://www.freeswitch.org
>>>>> http://wiki.freeswitch.org
>>>>> http://www.cluecon.com
>>>>>
>>>>> FreeSWITCH-users mailing list
>>>>> FreeSWITCH-users at lists.freeswitch.org
>>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>>> UNSUBSCRIBE:
>>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>>> http://www.freeswitch.org
>>>>>
>>>>>
>>>>
>>>>
>>>> _________________________________________________________________________
>>>> Professional FreeSWITCH Consulting Services:
>>>> consulting at freeswitch.org
>>>> http://www.freeswitchsolutions.com
>>>>
>>>> 
>>>> 
>>>>
>>>> Official FreeSWITCH Sites
>>>> http://www.freeswitch.org
>>>> http://wiki.freeswitch.org
>>>> http://www.cluecon.com
>>>>
>>>> FreeSWITCH-users mailing list
>>>> FreeSWITCH-users at lists.freeswitch.org
>>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>>> UNSUBSCRIBE:
>>>> http://lists.freeswitch.org/mailman/options/freeswitch-users
>>>> http://www.freeswitch.org
>>>>
>>>>
>>>
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>> 
>>> 
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>>
>>
>
> _________________________________________________________________________
> Professional FreeSWITCH Consulting Services:
> consulting at freeswitch.org
> http://www.freeswitchsolutions.com
>
> 
> 
>
> Official FreeSWITCH Sites
> http://www.freeswitch.org
> http://wiki.freeswitch.org
> http://www.cluecon.com
>
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130310/ccf7f3b1/attachment-0001.html 


Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users mailing list