[Freeswitch-users] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.
Ken Rice
krice at freeswitch.org
Fri Mar 8 04:45:38 MSK 2013
You are probably hammering the disk subsystem... Keep in mind that FS uses
multiple sqlite databases by default... Mount the fs db dir as tmpfs and try
again
On 3/7/13 7:35 PM, "Dmitry Lysenko" <dvl36.ripe.nick at gmail.com> wrote:
> Hm... But what about huge interrupt and context switching number?
>
>
> 2013/3/7 Steven Ayre <steveayre at gmail.com>
>> Upgrade to the latest version.
>>
>> There are some memory leaks that have been found and fixed in 1.2.7.
>>
>> -Steve
>>
>>
>>
>>
>> On 7 March 2013 11:09, bratner bratner <ratner2 at gmail.com> wrote:
>>> Hi!
>>>
>>> Please assist me to find a way to get some insight into this problem and
>>> file a proper bug report. Looks like 2 different problems, one is load
>>> related (like a race of some sort) the other is a mem leak.
>>>
>>>
>>> I'm running FS debs compiled on ubuntu 12.04 (-n20130227T114536Z-1~presice+
>>> 1git).
>>> The machine is ubuntu 12.04 server , CPU: Intel(R) Xeon(R) CPU E5620 @
>>> 2.40GHz, 4GB RAM and no swap. 3.1G RAM free after FS is started.
>>> I disabled HT in BIOS to get proper CPU load readings from top and vmstat
>>> (no HT also allows for higher cps for some reason...).
>>>
>>> The FS setup has 2 SIP profiles , internal/external (functioning as
>>> incoming/outgoing ), XML Dialplan that runs a lua script.
>>> Calls come in to profile 'internal', authed by ip acl , go to the dialplan
>>> and sent through profile 'external' to a uri string generated by the lua
>>> script.
>>>
>>> I bombard the machine with high call rate of short calls ,120-180cps , 5sec.
>>> Everything stays fine for some time , the higher the load the faster the
>>> problems appear.
>>> When i pressure the machine with 150cps and up , it works fine at first with
>>> 20%-30% idle CPU.
>>> After some time passes (10-30minutes) system cpu usage jumps from 3%-7% to
>>> around 50%, CPU is 95% busy and vmstat reports hundereds and sometimes
>>> thousands of runnable tasks.
>>> If i stop sending calls at this moment, wait for the CPU load to wind down
>>> and start calling again then the CPU load saturates right away. Even if i
>>> lower the CPS rate.
>>> restarting freeswitch at this point brings the system to the original state
>>> and it will take some time to get saturated again.
>>>
>>> Another symptom is that during the test the available free mem is slowly
>>> allocated to freeswitch until it reaches 13.1Gb virtual image and then the
>>> kernel kills it.
>>>
>>> Here are some numbers:
>>>
>>> Running 120 calls per second , 5second calls:
>>>
>>> Starts like this: This is very good from my POV.
>>>
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r b swpd free buff cache si so bi bo in cs us sy id
>>> wa
>>> 5 0 0 3255876 35444 96580 0 0 9 2 66 90 1 1 98
>>> 0
>>> 4 0 0 3253080 35444 96652 0 0 0 0 16048 20727 36 3
>>> 61 0
>>> 4 0 0 3247780 35444 96684 0 0 0 0 16521 21048 36 4
>>> 60 0
>>> 4 0 0 3242860 35444 96784 0 0 0 0 15695 19825 36 3
>>> 62 0
>>> 1 0 0 3237980 35452 96768 0 0 0 24 16072 19851 36 3
>>> 60 0
>>> 4 0 0 3233692 35452 96812 0 0 0 0 16299 20980 35 4
>>> 61 0
>>> 8 0 0 3230280 35452 96928 0 0 0 0 15034 18965 36 3
>>> 61 0
>>> 2 0 0 3227764 35452 96896 0 0 0 0 13956 20621 37 3
>>> 60 0
>>>
>>> After some (long) time it started having trouble and eat lots of free ram
>>> (note that the idle cpu% and system cpu% jumped up but there are still over
>>> 1G free mem available) :
>>>
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r b swpd free buff cache si so bi bo in cs us sy id
>>> wa
>>> 22 0 0 1108588 100304 2034628 0 0 0 0 145350 183629 37
>>> 53 10 0
>>> 5 0 0 1108224 100304 2034668 0 0 0 0 151937 167249 38
>>> 57 5 0
>>> 11 0 0 1108268 100304 2034904 0 0 0 0 149931 176143 37
>>> 58 5 0
>>> 7 0 0 1108244 100312 2034752 0 0 0 32 138612 183583 40
>>> 47 14 0
>>> 3 0 0 1108056 100312 2034792 0 0 0 0 111446 162733 42
>>> 29 29 0
>>> 4 0 0 1108064 100312 2034836 0 0 0 0 112568 164187 41
>>> 30 29 0
>>> 14 0 0 1108040 100312 2034880 0 0 0 0 133372 174676 39
>>> 45 17 0
>>> 10 0 0 1107468 100312 2034920 0 0 0 0 157371 184888 37
>>> 58 5 0
>>>
>>> Short time after that is got into really serious trouble (3000 runnable
>>> tasks!? heading toward memory exhaustion):
>>>
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r b swpd free buff cache si so bi bo in cs us sy id
>>> wa
>>>
>>> 3336 0 0 146580 9248 1457016 0 0 0 0 306642 291819 28
>>> 67 5 0
>>> 1180 0 0 145060 9220 1430212 0 0 248 848 294654 285273 27
>>> 68 5 0
>>> 1028 0 0 148616 8996 1366844 0 0 0 0 421614 385342 32
>>> 63 4 0
>>> 1026 0 0 150964 8880 1298832 0 0 0 28 462825 416480 31
>>> 65 4 0
>>> 1338 0 0 146260 8688 1242440 0 0 0 0 442247 391139 27
>>> 69 5 0
>>> 1945 0 0 143148 8680 1197180 0 0 0 0 364361 329958 24
>>> 71 5 0
>>> 1381 0 0 154668 8596 1137912 0 0 0 0 359982 323543 23
>>> 72 5 0
>>>
>>>
>>> After UAC completed transactions 5198242/ UAS completed transactions 5512619
>>> calls FS crashed and it looked like this:
>>>
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r b swpd free buff cache si so bi bo in cs us sy id
>>> wa
>>>
>>> 3 2 0 92972 568 20492 0 0 156 8 117314 130220 7 40
>>> 38 15 <-- back in the day i would kill for 92Mb free mem.
>>> 10 5 0 3473328 208 9848 0 0 24344 48 104557 89011 5 62
>>> 15 17 <- Now the kernel kills FS for taking the rest of it.
>>> 8 0 0 3482436 340 14408 0 0 3208 0 2582 2046 0 95 4
>>> 1
>>> 9 0 0 3504792 348 14344 0 0 0 48 2312 1836 0 93 7
>>> 0
>>> 12 0 0 3532184 348 14348 0 0 0 0 2387 1984 0 94 6
>>> 0
>>> 6 0 0 3561792 348 14876 0 0 528 0 2523 2178 0 93 6
>>> 0
>>> 11 0 0 3592816 348 14876 0 0 0 0 2684 2248 0 95 5
>>> 0
>>> 3 0 0 3615424 348 14876 0 0 0 0 2820 2494 0 95 5
>>> 0
>>> 7 0 0 3633488 348 14876 0 0 0 0 2990 2805 0 94 6
>>> 0
>>> 7 0 0 3654916 348 14876 0 0 0 0 3350 3222 0 96 4
>>> 0
>>> 11 0 0 3680708 348 14876 0 0 0 0 3869 4018 0 94 6
>>> 0
>>> 0 0 0 3773532 476 16876 0 0 2344 0 6598 9501 0 74 24
>>> 2
>>> 0 0 0 3773836 476 16912 0 0 0 0 191 129 0 0
>>> 100 0 <- FS is not running any more, 100% idle CPU , lots of free ram
>>> 0 0 0 3773856 476 17108 0 0 0 0 91 63 0 0
>>> 100 0 <tel:91%C2%A0%C2%A0%2063%C2%A0%200%C2%A0%200%20100%C2%A0%200>
>>>
>>>
>>>
>>> dmesg of the crash is this:
>>>
>>> [48674.576126] Out of memory: Kill process 31552 (freeswitch) score 674 or
>>> sacrifice child
>>> [48674.578188] Killed process 31552 (freeswitch) total-vm:10177480kB,
>>> anon-rss:2710092kB, file-rss:0kB
>>>
>>> fs_log switches to the dark side : (there was no disk space problem of
>>> course, i have no idea at which time point in the spiral down this popped on
>>> the console )
>>>
>>> 2013-03-06 23:27:22.923640 [ERR] switch_core_sqldb.c:579 NATIVE SQL ERR
>>> [database or disk is full]
>>> BEGIN EXCLUSIVE
>>> 2013-03-06 23:27:22.923640 [CRIT] switch_core_sqldb.c:1679 ERROR [database
>>> or disk is full]
>>> 2013-03-06 23:27:22.923640 [ERR] switch_core_sqldb.c:579 NATIVE SQL ERR
>>> [cannot commit - no transaction is active]
>>> COMMIT
>>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1674 Thread Failure!
>>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1634 LUKE: I'm hit,
>>> but not bad.
>>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1635 LUKE'S VOICE:
>>> Artoo, see what you can do with it. Hang on back there....
>>> Green laserfire moves past the beeping little robot as his head turns.
>>> After a few beeps and a twist of his mechanical arm,
>>> Artoo reduces the max sessions to 13897 thus, saving the switch from certain
>>> doom.
>>>
>>> and thats all , FS is dead.
>>>
>>> Runing the test at 180 cps will significantly hasten the death.
>>> It starts normal with ~40% idle cpu
>>> then after circa 160k calls it starts to spiral:
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r b swpd free buff cache si so bi bo in cs us sy id
>>> wa
>>> 6 0 0 2901484 63204 161132 0 0 0 0 20586 33073 54 5
>>> 41 0
>>> 5 0 0 2901468 63212 160988 0 0 0 12 25359 41915 53 6
>>> 42 0
>>> 4 0 0 2901220 63212 161064 0 0 0 0 19650 31211 55 5
>>> 40 0
>>> 7 0 0 2901080 63212 161124 0 0 0 0 21148 33462 54 5
>>> 41 0
>>> 2 0 0 2900676 63212 161188 0 0 0 0 21767 35568 54 5
>>> 42 0
>>> 5 0 0 2900468 63212 161256 0 0 0 0 19398 30073 54 5
>>> 41 0
>>> 2 0 0 2900088 63220 161304 0 0 0 12 21315 34563 54 5
>>> 42 0
>>> 3 0 0 2900180 63220 161484 0 0 0 0 19858 31561 42 4
>>> 54 0 <-- this should not happen the load is constant and idle cpu%
>>> should stay at ~40%
>>> 3 0 0 2899768 63220 161520 0 0 0 0 29907 46883 39 4
>>> 57 0
>>> 11 0 0 2900056 63220 161504 0 0 0 0 41730 63440 41 5
>>> 54 0
>>> 9 0 0 2899536 63220 161768 0 0 0 0 48876 68159 40 5
>>> 54 0
>>> 6 0 0 2899512 63228 161620 0 0 0 12 59988 80421 41 6
>>> 53 0
>>> 49 0 0 2876072 63228 162004 0 0 0 0 193384 241453 57 23
>>> 21 0 <-- here we go, starting the memory hogging cpu eating spiral
>>> 40 0 0 2830720 63228 162080 0 0 0 0 308948 356178 51
>>> 49 0 0
>>> 156 0 0 2776576 63228 161984 0 0 0 0 301514 325415 48
>>> 48 5 0
>>> 178 0 0 2727044 63228 162260 0 0 0 0 290168 306158 44
>>> 51 5 0
>>> 70 0 0 2683252 63236 161924 0 0 0 48 291792 302632 37
>>> 58 5 0
>>> 5 0 0 2618432 63236 162204 0 0 0 0 316170 302886 35
>>> 59 7 0
>>> 2 0 0 2548092 63236 162364 0 0 0 0 340707 319105 38
>>> 56 5 0
>>> 2 0 0 2460528 63236 162464 0 0 0 1596 386953 348210 40
>>> 54 6 0
>>>
>>> Nothing on fs warn/err log at this point until memory is all full.
>>>
>>>
>>> Please give me some ideas on how to get some useful info for a bug report.
>>>
>>> Kind Regards,
>>> Boris Ratner.
>>>
>>> _________________________________________________________________________
>>> Professional FreeSWITCH Consulting Services:
>>> consulting at freeswitch.org
>>> http://www.freeswitchsolutions.com
>>>
>>>
>>>
>>>
>>> Official FreeSWITCH Sites
>>> http://www.freeswitch.org
>>> http://wiki.freeswitch.org
>>> http://www.cluecon.com
>>>
>>> FreeSWITCH-users mailing list
>>> FreeSWITCH-users at lists.freeswitch.org
>>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>>> http://www.freeswitch.org
>>>
>>
>>
>> _________________________________________________________________________
>> Professional FreeSWITCH Consulting Services:
>> consulting at freeswitch.org
>> http://www.freeswitchsolutions.com
>>
>>
>>
>>
>> Official FreeSWITCH Sites
>> http://www.freeswitch.org
>> http://wiki.freeswitch.org
>> http://www.cluecon.com
>>
>> FreeSWITCH-users mailing list
>> FreeSWITCH-users at lists.freeswitch.org
>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>> http://www.freeswitch.org
>>
>
>
>
> _________________________________________________________________________
> Professional FreeSWITCH Consulting Services:
> consulting at freeswitch.org
> http://www.freeswitchsolutions.com
>
>
>
>
> Official FreeSWITCH Sites
> http://www.freeswitch.org
> http://wiki.freeswitch.org
> http://www.cluecon.com
>
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
--
Ken
http://www.FreeSWITCH.org
http://www.ClueCon.com
http://www.OSTAG.org
irc.freenode.net #freeswitch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130307/f8c20b28/attachment-0001.html
Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users
mailing list