[Freeswitch-users] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.
bratner bratner
ratner2 at gmail.com
Fri Mar 8 21:15:19 MSK 2013
Here is sipp output and additional numbers for a test I ran with -nosql
param.
The test ran 180CPS for ~3500seconds and the rest with 210cps.
Trouble (as in higher system cpu% ) started to appear around 8591seconds
into the test.
As you can see below the problem started just before 9124sec into the test
210cps 5sec calls
should not give you a lot more then 1050 concurrent calls.
------------------------------ Scenario Screen -------- [1-9]: Change
Screen --
Call-rate(length) Port Total-time Total-calls Remote-host
210.0(5000 ms)/1.000s 5061 9157.32 s 1834024 192.96.201.164:5060
(UDP)
0 new calls during 0.000 s period 0 ms scheduler resolution
0 calls (limit 2000) Peak was 2000 calls, after 9124 s
0 Running, 4640 Paused, 0 Woken up
20 dead call msg (discarded) 0 out-of-call msg
(discarded)
1 open sockets
Messages Retrans Timeout
Unexpected-Msg
INVITE ----------> 1834024 74 0
100 <---------- 1834024 0 0 0
180 <---------- 1834024 0 0 0
183 <---------- 0 0 0 0
500 <---------- 0 0 0 0
502 <---------- 0 0 0 0
503 <---------- 0 0 0 0
408 <---------- 0 0 0 0
480 <---------- 0 0 0 0
200 <---------- E-RTD1 1834024 81 0 0
ACK ----------> 1834024 81
Pause [ 5000ms] 1834024 0
BYE ----------> 1834024 7646 0
503 <---------- 0 0 0 0
200 <---------- 1834024 0 0 0
------------------------------ Test Terminated
--------------------------------
----------------------------- Statistics Screen ------- [1-9]: Change
Screen --
Start Time | 2013-03-08 15:22:18:204
1362756138.204833
Last Reset Time | 2013-03-08 17:54:55:535
1362765295.535214
Current Time | 2013-03-08 17:54:55:535
1362765295.535437
-------------------------+---------------------------+--------------------------
Counter Name | Periodic value | Cumulative value
-------------------------+---------------------------+--------------------------
Elapsed Time | 00:00:00:000 |
02:32:37:330
Call Rate | 0.000 cps | 200.279
cps
-------------------------+---------------------------+--------------------------
Incoming call created | 0 |
0
OutGoing call created | 0 |
1834024
Total Call created | |
1834024
Current Call | 0
|
-------------------------+---------------------------+--------------------------
Successful call | 0 |
1834024
Failed call | 0 |
0
-------------------------+---------------------------+--------------------------
Response Time 1 | 00:00:00:000 |
00:00:00:149
Call Length | 00:00:00:000 |
00:00:05:158
------------------------------ Test Terminated
--------------------------------
After stopping the load FS still hogs 22.1% of memory.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
15995 freeswit -2 -10 4677m 873m 5028 S 0 22.1 755:28.65
freeswitch
The symptoms of the crash are the same, just now with higher CPS and takes
more time (more calls ) before crashing.
I will appreciate any suggestion.
Regards,
Boris Ratner.
On Fri, Mar 8, 2013 at 6:22 PM, bratner bratner <ratner2 at gmail.com> wrote:
> The original test was done on git master at the date mentioned. The sqlite
> core.db file was on /run/shm which is a tmpfs on unbuntu 12.04.
> I will be recompiling from git master and test running with -nosql.
>
> Testing my existing setup with -nosql seems more stable now running at
> 210CPS for some time (500k calls already passed) with ~35% idle cpu.
> But the free mem is slowly going down. I will let it run untill the kernel
> will kill it to see how many calls it can handle.
>
> During my tests i did not run FS with RT priority but according to htop
> some of the threads are scheduled as RT.
> My setup is doing bypass-media , thus FS handling only call establishment
> and teardown on both legs.
>
> cat /proc/<FS pid>/status
>
> Name: freeswitch
> State: S (sleeping)
> Tgid: 15995
> Pid: 15995
> PPid: 1
> TracerPid: 0
> Uid: 999 999 999 999
> Gid: 999 999 999 999
> FDSize: 64
> Groups:
> VmPeak: 5002808 kB
> VmSize: 5002088 kB
> VmLck: 0 kB
> VmPin: 0 kB
> VmHWM: 625900 kB
> VmRSS: 624156 kB <-- this is going up
> VmData: 4855788 kB
> VmStk: 136 kB
> VmExe: 20 kB
> VmLib: 18288 kB
> VmPTE: 2352 kB
> VmSwap: 0 kB
> Threads: 1866
> SigQ: 0/18446744073709551615
> SigPnd: 0000000000000000
> ShdPnd: 0000000000000000
> SigBlk: 0000000000000000
> SigIgn: 0000000010003006
> SigCgt: 0000000180014209
> CapInh: 0000000000000000
> CapPrm: 0000000000000000
> CapEff: 0000000000000000
> CapBnd: ffffffffffffffff
> Cpus_allowed: ffffff
> Cpus_allowed_list: 0-23
> Mems_allowed: 00000000,00000003
> Mems_allowed_list: 0-1
> voluntary_ctxt_switches: 1803
> nonvoluntary_ctxt_switches: 23
>
>
> output of 'top -H' at 180CPS
>
>
> top - 15:27:00 up 2 days, 5:32, 5 users, load average: 8.19, 91.07,
> 65.03
> Tasks: 2066 total, 3 running, 2063 sleeping, 0 stopped, 0 zombie
> Cpu(s): 50.1%us, 3.9%sy, 0.0%ni, 45.9%id, 0.0%wa, 0.0%hi, 0.2%si,
> 0.0%st
> Mem: 4038512k total, 2282260k used, 1756252k free, 114112k buffers
> Swap: 0k total, 0k used, 0k free, 1165868k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
>
> 16000 freeswit RT -10 4885m 594m 4964 R 69 15.1 3:10.26
> freeswitch
>
> 16009 freeswit RT -10 4885m 594m 4964 S 33 15.1 1:26.20
> freeswitch
>
> 16008 freeswit RT -10 4885m 594m 4964 S 28 15.1 1:17.30
> freeswitch
>
> 16007 freeswit RT -10 4885m 594m 4964 S 4 15.1 0:10.80
> freeswitch
>
> 16004 freeswit RT -10 4885m 594m 4964 S 2 15.1 0:06.63
> freeswitch
>
> 19171 root 20 0 18988 2948 944 R 2 0.1 0:00.64
> top
>
> 18735 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:00.29
> freeswitch
>
> 16003 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:01.61
> freeswitch
>
> 16690 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:00.42
> freeswitch
>
> 16730 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:00.42
> freeswitch
>
> 16750 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:00.45
> freeswitch
>
> 16764 freeswit -2 -10 4885m 594m 4964 S 1 15.1 0:00.44
> freeswitch
>
> <more of the above>
> ....
> ....
>
>
> Thanks to all of you ,
> Boris Ratner.
>
> On Fri, Mar 8, 2013 at 4:22 AM, Dmitry Lysenko <dvl36.ripe.nick at gmail.com>wrote:
>
>> I can't reproduce such cps load on my ARMv5TE system. )
>> bratner, please give us 'top -H'. I guess freeswitch running at realtime
>> priority.
>>
>>
>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>
>>> Sqlite is probably getting hammered... Trust me... Mount the fs db dir
>>> as tmpfs or use the –nosql flag when starting freeswitch
>>>
>>> I routinely run dialer traffic at much higher CPS then that
>>>
>>>
>>>
>>> On 3/7/13 7:58 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com> wrote:
>>>
>>> bi, bo and wa field is low, so it seems that is not disk subsystem.
>>>
>>>
>>> 2013/3/8 Ken Rice <krice at freeswitch.org>
>>>
>>> You are probably hammering the disk subsystem... Keep in mind that FS
>>> uses multiple sqlite databases by default... Mount the fs db dir as tmpfs
>>> and try again
>>>
>>>
>>>
>>> On 3/7/13 7:35 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com <
>>> http://dvl36.ripe.nick@gmail.com> > wrote:
>>>
>>> Hm... But what about huge interrupt and context switching number?
>>>
>>>
>>> ------------------------------
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>>
>>>
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>>
>>> --
>>> Ken
>>> *http://www.FreeSWITCH.org
>>> http://www.ClueCon.com
>>> http://www.OSTAG.org
>>> *irc.freenode.net #freeswitch
>>>
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>>
>>>
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>>
>>
>> _________________________________________________________________________
>> Professional FreeSWITCH Consulting Services:
>> consulting at freeswitch.org
>> http://www.freeswitchsolutions.com
>>
>>
>>
>>
>> Official FreeSWITCH Sites
>> http://www.freeswitch.org
>> http://wiki.freeswitch.org
>> http://www.cluecon.com
>>
>> FreeSWITCH-users mailing list
>> FreeSWITCH-users at lists.freeswitch.org
>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>> http://www.freeswitch.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130308/46e3bdc7/attachment-0001.html
Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users
mailing list