[Freeswitch-users] Capacity testing, seg fault

tuhl at ix.netcom.com tuhl at ix.netcom.com
Thu Nov 29 15:25:17 PST 2007


I removed the recordFile and replaced it with a 
10s sleep, and still got the segfaults. I did see 
some improvement by going to a 2s sleep instead 
of a 10s sleep, but then I'm not really testing 
what I want to be testing - because if the calls 
are only lasting 2s, that results in much fewer simultaneous channels in use.

I will try swapping the roles of the 2 servers. 
It's not hard. After I do Brian's suggestion and 
convert away from JS, I'll clean up my scripts and upload them too.

Tom

At 01:32 PM 11/29/2007, Michael Collins wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C832CF.4FDA2698"
>
>Tom,
>
>This sounds very interesting.  I’d like to know a few things:
>First, if you disable the call recording on the 
>receiving end, will you still get the segfaults 
>consistently?  Just curious to see what the 
>receiver does if it only answers the calls and 
>then hangs up without the extra burden of recording the audio streams.
>Second, how hard would it be to have the 
>originator and the receiver trade places?  If 
>you could, I’d like to see the two machines 
>switch roles, so that the machine currently 
>acting as the receiver makes the calls and the 
>machine acting as the originator will now 
>receive calls.  I’m wondering what will happen – 
>will the segfaults stay at the same machine or 
>will they go over to the new receiver, or will they go away altogether
?
>
>I know those are kinda brute force suggestions 
>but they might yield some interesting information:
>If the segfaults occur only on one machine, 
>regardless of whether it’s making or receiving 
>calls then obviously there’s something up with that machine.
>If the segfaults always occur at the machine 
>receiving the calls then, of course, we’ve got a more interesting issue.
>
>Would you mind putting your setup info, scripts, 
>etc. in to the pastebin?  Maybe others could try 
>to replicate your symptoms and see what shakes out.
>
>Thanks for taking the initiative to do this kind 
>of testing.  It will definitely help FS be a better, more stable product.
>
>-MC
>
>P.S. – I just saw Brian’s emails on this thread, 
>so be sure to check his suggestions as well!
>
>
>
>----------
>From: 
>freeswitch-users-bounces at lists.freeswitch.org 
>[mailto:freeswitch-users-bounces at lists.freeswitch.org] 
>On Behalf Of tuhl at ix.netcom.com
>Sent: Thursday, November 29, 2007 12:54 PM
>To: freeswitch-users at lists.freeswitch.org
>Subject: [Freeswitch-users] Capacity testing, seg fault
>
>Hi,
>
>I'm running some capacity tests on Freeswitch 
>and can cause seg-faults fairly quickly (<1 
>minute) at a 'light' load of 10 call 
>originations per second. Core dump backtrace is 
>at the bottom, and my debugging shows what looks 
>like corrupted js_session. I'll open an issue on 
>JIRA. I wanted to get opinions on whether this 
>is a valid architecture for testing capacity, 
>and whether I'm making a simple mistake.
>
>Environment:
>I have the trunk version installed on 2 servers 
>and have one server (the originator) calling the 
>other (the receiver) using SIP, g711, with a 
>Gig-E ethernet switch between them. The 
>originating server basically does a 
>session.originate, waitForAnswer, streamFile 
>(10-second 8khz .wav), and hangup. The receiver 
>does a session.answer and a recordFile to a .wav 
>file so I can go back and check voice quality.
>
>My capacity testing engine is a Perl script 
>which is using the RPC XML interface to 
>originate the calls on Freeswitch (I submit 
>requests to do a 'jsrun play.js' to a certain 
>phone number, where play.js is a simple script 
>which originates, waitsforanswer, streamfile, 
>hangup) . I can configure it to make a certain 
>number of originations per second and a certain 
>number of total calls. I have no Perl script 
>running on the receiver - I just setup the 
>dialplan to call a .js which answers the call 
>and records it. This testing setup is at an 
>early stage, so right now, to check pass/fail, I 
>just verify that if I ran a 1000-call test on 
>the originator, there should be 1000 .wav files 
>that are all about the same size, on the 
>receiver at the end of the test, and no crashes.
>
>I've compiled with debug flags on, and I've set 
>all *_DEBUG flags to 9 (I have also run the 
>tests after a recompile with debug flags off/0, 
>and that didn't make any difference). I've done 
>all the ulimit commands that were in the last 
>few emails on this list. I'm running on FC6 on a 
>Dell 2850 with dual 3.6ghz Xeons (/proc/cpuinfo 
>shows 4 processors), and 4G RAM. I've set 
>max-sessions to 3000 and Session Rate to 100. 
>'top' is showing freeswitch at 60-80% on the receiver during this test.
>
>ISSUE:
>My problem right now is on the receiver, which I 
>wouldn't care about because I'm most interested 
>in the origination capacity of freeswitch, but 
>with my receiver crashing so quickly, I can't 
>push the originator very hard. I setup my 
>originating engine to make 1000 total calls at 
>10 call originations per second, each call 
>lasting 10 seconds (which results in about 120 
>simultaneous channels in use), and I get a seg 
>fault on the receiver within about 500 calls or 
>50 seconds. If I run a test with 1000 total 
>calls at 6 call originations per second, it will 
>work, but if I run an overnight test with 20,000 
>total calls at 6 call originations per second, 
>the receiver will sometimes seg-fault at around 
>15,000 calls, and sometimes it will not. 
>Interestingly though, if I do a very short but 
>very high rate test of 100 total calls at 50 
>call originations per second, that will usually 
>work. But 500 total calls at 50 calls per second will always seg-fault.
>
>Just so you know... I'm shooting for the holy 
>grail of stable operation at 100 call 
>originations per second. I know people have 
>reported much better results than I'm getting. Is something in my setup bad?
>
>Here's the core dump backtrace. I added some 
>debug printf's in session_destroy right before 
>the call to destroy_speech_engine, and it looks 
>like the jss has been trampled - for example, 
>jss->flags is always 0 for all my successful 
>calls, but right before it seg-faults, 
>jss->flags is some large random number. This happens every single time.
>
>Program terminated with signal 11, Segmentation fault.
>#0  0x00000000 in ?? ()
>(gdb) bt
>#0  0x00000000 in ?? ()
>#1  0x40040437 in switch_core_codec_destroy 
>(codec=0x54ece168) at src/switch_core_codec.c:245
>#2  0x40ee778b in destroy_speech_engine 
>(jss=0x51206538) at mod_spidermonkey.c:1652
>#3  0x40eeaa70 in session_destroy 
>(cx=0x549c9920, obj=0x4eeac7f0) at mod_spidermonkey.c:2723
>#4  0x417d1aa7 in js_FinalizeObject 
>(cx=0x549c9920, obj=0x4eeac7f0) at src/jsobj.c:2168
>#5  0x417b04d9 in js_GC (cx=0x549c9920, gcflags=0) at src/jsgc.c:1856
>#6  0x417af6ad in js_ForceGC (cx=0x549c9920, gcflags=0) at src/jsgc.c:1508
>#7  0x417830fd in js_DestroyContext 
>(cx=0x549c9920, gcmode=JS_FORCE_GC) at src/jscntxt.c:285
>#8  0x417727ac in JS_DestroyContext (cx=0x549c9920) at src/jsapi.c:956
>#9  0x40eec3c9 in js_parse_and_execute 
>(session=0x464d9678, input_code=0x9e31458 
>"capacity.js", ro=0x0) at mod_spidermonkey.c:3296
>#10 0x40eec3f2 in js_dp_function 
>(session=0x464d9678, data=0x9e31458 "capacity.js") at mod_spidermonkey.c:3302
>#11 0x40044341 in switch_core_session_exec 
>(session=0x464d9678, 
>application_interface=0x40f26f80, arg=0x9e31458 "capacity.js")
>     at src/switch_core_session.c:936
>#12 0x400455be in 
>switch_core_standard_on_execute 
>(session=0x464d9678) at src/switch_core_state_machine.c:169
>#13 0x40046605 in switch_core_session_run 
>(session=0x464d9678) at src/switch_core_state_machine.c:406
>#14 0x4004381c in switch_core_session_thread 
>(thread=0x9e31288, obj=0x464d9678) at src/switch_core_session.c:681
>#15 0x4009047c in dummy_worker 
>(opaque=0x9e31288) at threadproc/unix/thread.c:138
>#16 0x007cb3db in start_thread () from /lib/libpthread.so.0
>#17 0x0072506e in clone () from /lib/libc.so.6
>
>
>Tom
>
>
>===============
>tuhl at ix.netcom.com
>_______________________________________________
>Freeswitch-users mailing list
>Freeswitch-users at lists.freeswitch.org
>http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>http://www.freeswitch.org

===============
tuhl at ix.netcom.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20071129/368c8340/attachment-0002.html 


More information about the FreeSWITCH-users mailing list