[Freeswitch-users] High cps load causes weird cpu and memory starvation. Need suggestions on how to debug.

Dmitry Lysenko dvl36.ripe.nick at gmail.com
Fri Mar 8 04:35:22 MSK 2013


Hm... But what about huge interrupt and context switching  number?


2013/3/7 Steven Ayre <steveayre at gmail.com>

> Upgrade to the latest version.
>
> There are some memory leaks that have been found and fixed in 1.2.7.
>
> -Steve
>
>
>
>
> On 7 March 2013 11:09, bratner bratner <ratner2 at gmail.com> wrote:
>
>> Hi!
>>
>> Please assist me to find a way to get some insight into this problem and
>> file a proper bug report. Looks like 2 different problems, one is load
>> related (like a race of some sort) the other is a mem leak.
>>
>>
>> I'm running FS debs compiled on ubuntu 12.04
>> (-n20130227T114536Z-1~presice+
>> 1git).
>> The machine is ubuntu 12.04 server , CPU: Intel(R) Xeon(R) CPU E5620  @
>> 2.40GHz, 4GB RAM and no swap. 3.1G RAM free after FS is started.
>> I disabled HT in BIOS to get proper CPU load readings from top and vmstat
>> (no HT also allows for higher cps for some reason...).
>>
>> The FS setup has 2 SIP profiles , internal/external (functioning as
>> incoming/outgoing ), XML Dialplan that runs a lua script.
>> Calls come in to profile 'internal',  authed by ip acl , go to the
>> dialplan and sent through profile 'external' to a uri string generated by
>> the lua script.
>>
>> I bombard the machine with high call rate of short calls ,120-180cps ,
>> 5sec.
>> Everything stays fine for some time , the higher the load the faster the
>> problems appear.
>> When i pressure the machine with 150cps and up , it works fine at first
>> with 20%-30% idle CPU.
>> After some time passes (10-30minutes) system cpu usage jumps from 3%-7%
>> to around 50%, CPU is 95% busy and vmstat reports hundereds and sometimes
>> thousands of runnable tasks.
>> If i stop sending calls at this moment, wait for the CPU load to wind
>> down and start calling again then the CPU load saturates right away. Even
>> if i lower the CPS rate.
>> restarting freeswitch at this point brings the system to the original
>> state and it will take some time to get saturated again.
>>
>> Another symptom is that during the test the available free mem is slowly
>> allocated to freeswitch until it reaches 13.1Gb virtual image and then the
>> kernel kills it.
>>
>> Here are some numbers:
>>
>> Running 120 calls per second , 5second calls:
>>
>> Starts like this: This is very good from my POV.
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>  5  0      0 3255876  35444  96580    0    0     9     2   66   90  1  1
>> 98  0
>>  4  0      0 3253080  35444  96652    0    0     0     0 16048 20727 36
>> 3 61  0
>>  4  0      0 3247780  35444  96684    0    0     0     0 16521 21048 36
>> 4 60  0
>>  4  0      0 3242860  35444  96784    0    0     0     0 15695 19825 36
>> 3 62  0
>>  1  0      0 3237980  35452  96768    0    0     0    24 16072 19851 36
>> 3 60  0
>>  4  0      0 3233692  35452  96812    0    0     0     0 16299 20980 35
>> 4 61  0
>>  8  0      0 3230280  35452  96928    0    0     0     0 15034 18965 36
>> 3 61  0
>>  2  0      0 3227764  35452  96896    0    0     0     0 13956 20621 37
>> 3 60  0
>>
>> After some (long) time it started having trouble and eat lots of free ram
>> (note that the idle cpu% and system cpu%  jumped up but there are still
>> over 1G free mem available) :
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>> 22  0      0 1108588 100304 2034628    0    0     0     0 145350 183629
>> 37 53 10  0
>>  5  0      0 1108224 100304 2034668    0    0     0     0 151937 167249
>> 38 57  5  0
>> 11  0      0 1108268 100304 2034904    0    0     0     0 149931 176143
>> 37 58  5  0
>>  7  0      0 1108244 100312 2034752    0    0     0    32 138612 183583
>> 40 47 14  0
>>  3  0      0 1108056 100312 2034792    0    0     0     0 111446 162733
>> 42 29 29  0
>>  4  0      0 1108064 100312 2034836    0    0     0     0 112568 164187
>> 41 30 29  0
>> 14  0      0 1108040 100312 2034880    0    0     0     0 133372 174676
>> 39 45 17  0
>> 10  0      0 1107468 100312 2034920    0    0     0     0 157371 184888
>> 37 58  5  0
>>
>> Short time after that is got into really serious trouble (3000 runnable
>> tasks!? heading toward memory exhaustion):
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>
>> 3336  0      0 146580   9248 1457016    0    0     0     0 306642 291819
>> 28 67  5  0
>> 1180  0      0 145060   9220 1430212    0    0   248   848 294654 285273
>> 27 68  5  0
>> 1028  0      0 148616   8996 1366844    0    0     0     0 421614 385342
>> 32 63  4  0
>> 1026  0      0 150964   8880 1298832    0    0     0    28 462825 416480
>> 31 65  4  0
>> 1338  0      0 146260   8688 1242440    0    0     0     0 442247 391139
>> 27 69  5  0
>> 1945  0      0 143148   8680 1197180    0    0     0     0 364361 329958
>> 24 71  5  0
>> 1381  0      0 154668   8596 1137912    0    0     0     0 359982 323543
>> 23 72  5  0
>>
>>
>> After UAC completed transactions 5198242/ UAS completed transactions
>> 5512619 calls FS crashed and it looked like this:
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>
>> 3  2      0  92972    568  20492    0    0   156     8 117314 130220  7
>> 40 38 15   <-- back in the day i would kill for 92Mb free mem.
>> 10  5      0 3473328    208   9848    0    0 24344    48 104557 89011  5
>> 62 15 17   <- Now the kernel kills FS for taking the rest of it.
>>  8  0      0 3482436    340  14408    0    0  3208     0 2582 2046  0 95
>> 4  1
>>  9  0      0 3504792    348  14344    0    0     0    48 2312 1836  0 93
>> 7  0
>> 12  0      0 3532184    348  14348    0    0     0     0 2387 1984  0 94
>> 6  0
>>  6  0      0 3561792    348  14876    0    0   528     0 2523 2178  0 93
>> 6  0
>> 11  0      0 3592816    348  14876    0    0     0     0 2684 2248  0 95
>> 5  0
>>  3  0      0 3615424    348  14876    0    0     0     0 2820 2494  0 95
>> 5  0
>>  7  0      0 3633488    348  14876    0    0     0     0 2990 2805  0 94
>> 6  0
>>  7  0      0 3654916    348  14876    0    0     0     0 3350 3222  0 96
>> 4  0
>> 11  0      0 3680708    348  14876    0    0     0     0 3869 4018  0 94
>> 6  0
>>  0  0      0 3773532    476  16876    0    0  2344     0 6598 9501  0 74
>> 24  2
>>  0  0      0 3773836    476  16912    0    0     0     0  191  129  0  0
>> 100  0   <- FS is not running any more, 100% idle CPU , lots of free ram
>>  0  0      0 3773856    476  17108    0    0     0     0   91   63  0  0
>> 100  0
>>
>>
>>
>> dmesg of the crash is this:
>>
>> [48674.576126] Out of memory: Kill process 31552 (freeswitch) score 674
>> or sacrifice child
>> [48674.578188] Killed process 31552 (freeswitch) total-vm:10177480kB,
>> anon-rss:2710092kB, file-rss:0kB
>>
>> fs_log switches to the dark side : (there was no disk space problem of
>> course, i have no idea at which time point in the spiral down this popped
>> on the console )
>>
>> 2013-03-06 23:27:22.923640 [ERR] switch_core_sqldb.c:579 NATIVE SQL ERR
>> [database or disk is full]
>> BEGIN EXCLUSIVE
>> 2013-03-06 23:27:22.923640 [CRIT] switch_core_sqldb.c:1679 ERROR
>> [database or disk is full]
>> 2013-03-06 23:27:22.923640 [ERR] switch_core_sqldb.c:579 NATIVE SQL ERR
>> [cannot commit - no transaction is active]
>> COMMIT
>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1674 Thread
>> Failure!
>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1634 LUKE: I'm
>> hit, but not bad.
>> 2013-03-06 23:27:22.963609 [CRIT] switch_core_session.c:1635 LUKE'S
>> VOICE: Artoo, see what you can do with it. Hang on back there....
>> Green laserfire moves past the beeping little robot as his head turns.
>> After a few beeps and a twist of his mechanical arm,
>> Artoo reduces the max sessions to 13897 thus, saving the switch from
>> certain doom.
>>
>> and thats all , FS is dead.
>>
>> Runing the test at 180 cps will significantly hasten the death.
>> It starts normal with ~40% idle cpu
>> then after circa 160k calls it starts to spiral:
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa
>>  6  0      0 2901484  63204 161132    0    0     0     0 20586 33073 54
>> 5 41  0
>>  5  0      0 2901468  63212 160988    0    0     0    12 25359 41915 53
>> 6 42  0
>>  4  0      0 2901220  63212 161064    0    0     0     0 19650 31211 55
>> 5 40  0
>>  7  0      0 2901080  63212 161124    0    0     0     0 21148 33462 54
>> 5 41  0
>>  2  0      0 2900676  63212 161188    0    0     0     0 21767 35568 54
>> 5 42  0
>>  5  0      0 2900468  63212 161256    0    0     0     0 19398 30073 54
>> 5 41  0
>>  2  0      0 2900088  63220 161304    0    0     0    12 21315 34563 54
>> 5 42  0
>>  3  0      0 2900180  63220 161484    0    0     0     0 19858 31561 42
>> 4 54  0       <-- this should not happen the load is constant and idle cpu%
>> should stay at ~40%
>>  3  0      0 2899768  63220 161520    0    0     0     0 29907 46883 39
>> 4 57  0
>> 11  0      0 2900056  63220 161504    0    0     0     0 41730 63440 41
>> 5 54  0
>>  9  0      0 2899536  63220 161768    0    0     0     0 48876 68159 40
>> 5 54  0
>>  6  0      0 2899512  63228 161620    0    0     0    12 59988 80421 41
>> 6 53  0
>> 49  0      0 2876072  63228 162004    0    0     0     0 193384 241453 57
>> 23 21  0      <-- here we go,  starting the memory hogging cpu eating spiral
>> 40  0      0 2830720  63228 162080    0    0     0     0 308948 356178 51
>> 49  0  0
>> 156  0      0 2776576  63228 161984    0    0     0     0 301514 325415
>> 48 48  5  0
>> 178  0      0 2727044  63228 162260    0    0     0     0 290168 306158
>> 44 51  5  0
>> 70  0      0 2683252  63236 161924    0    0     0    48 291792 302632 37
>> 58  5  0
>>  5  0      0 2618432  63236 162204    0    0     0     0 316170 302886 35
>> 59  7  0
>>  2  0      0 2548092  63236 162364    0    0     0     0 340707 319105 38
>> 56  5  0
>>  2  0      0 2460528  63236 162464    0    0     0  1596 386953 348210 40
>> 54  6  0
>>
>> Nothing on fs warn/err log at this point until memory is all full.
>>
>>
>> Please give me some ideas on how to get some useful info for a bug report.
>>
>> Kind Regards,
>> Boris Ratner.
>>
>> _________________________________________________________________________
>> Professional FreeSWITCH Consulting Services:
>> consulting at freeswitch.org
>> http://www.freeswitchsolutions.com
>>
>> 
>> 
>>
>> Official FreeSWITCH Sites
>> http://www.freeswitch.org
>> http://wiki.freeswitch.org
>> http://www.cluecon.com
>>
>> FreeSWITCH-users mailing list
>> FreeSWITCH-users at lists.freeswitch.org
>> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>> http://www.freeswitch.org
>>
>>
>
> _________________________________________________________________________
> Professional FreeSWITCH Consulting Services:
> consulting at freeswitch.org
> http://www.freeswitchsolutions.com
>
> 
> 
>
> Official FreeSWITCH Sites
> http://www.freeswitch.org
> http://wiki.freeswitch.org
> http://www.cluecon.com
>
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20130308/06deb9b2/attachment-0001.html 


Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users mailing list