[Freeswitch-users] Capacity testing, seg fault
tuhl at ix.netcom.com
tuhl at ix.netcom.com
Thu Nov 29 15:25:17 PST 2007
I removed the recordFile and replaced it with a
10s sleep, and still got the segfaults. I did see
some improvement by going to a 2s sleep instead
of a 10s sleep, but then I'm not really testing
what I want to be testing - because if the calls
are only lasting 2s, that results in much fewer simultaneous channels in use.
I will try swapping the roles of the 2 servers.
It's not hard. After I do Brian's suggestion and
convert away from JS, I'll clean up my scripts and upload them too.
Tom
At 01:32 PM 11/29/2007, Michael Collins wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
> boundary="----_=_NextPart_001_01C832CF.4FDA2698"
>
>Tom,
>
>This sounds very interesting. Id like to know a few things:
>First, if you disable the call recording on the
>receiving end, will you still get the segfaults
>consistently? Just curious to see what the
>receiver does if it only answers the calls and
>then hangs up without the extra burden of recording the audio streams.
>Second, how hard would it be to have the
>originator and the receiver trade places? If
>you could, Id like to see the two machines
>switch roles, so that the machine currently
>acting as the receiver makes the calls and the
>machine acting as the originator will now
>receive calls. Im wondering what will happen
>will the segfaults stay at the same machine or
>will they go over to the new receiver, or will they go away altogether
?
>
>I know those are kinda brute force suggestions
>but they might yield some interesting information:
>If the segfaults occur only on one machine,
>regardless of whether its making or receiving
>calls then obviously theres something up with that machine.
>If the segfaults always occur at the machine
>receiving the calls then, of course, weve got a more interesting issue.
>
>Would you mind putting your setup info, scripts,
>etc. in to the pastebin? Maybe others could try
>to replicate your symptoms and see what shakes out.
>
>Thanks for taking the initiative to do this kind
>of testing. It will definitely help FS be a better, more stable product.
>
>-MC
>
>P.S. I just saw Brians emails on this thread,
>so be sure to check his suggestions as well!
>
>
>
>----------
>From:
>freeswitch-users-bounces at lists.freeswitch.org
>[mailto:freeswitch-users-bounces at lists.freeswitch.org]
>On Behalf Of tuhl at ix.netcom.com
>Sent: Thursday, November 29, 2007 12:54 PM
>To: freeswitch-users at lists.freeswitch.org
>Subject: [Freeswitch-users] Capacity testing, seg fault
>
>Hi,
>
>I'm running some capacity tests on Freeswitch
>and can cause seg-faults fairly quickly (<1
>minute) at a 'light' load of 10 call
>originations per second. Core dump backtrace is
>at the bottom, and my debugging shows what looks
>like corrupted js_session. I'll open an issue on
>JIRA. I wanted to get opinions on whether this
>is a valid architecture for testing capacity,
>and whether I'm making a simple mistake.
>
>Environment:
>I have the trunk version installed on 2 servers
>and have one server (the originator) calling the
>other (the receiver) using SIP, g711, with a
>Gig-E ethernet switch between them. The
>originating server basically does a
>session.originate, waitForAnswer, streamFile
>(10-second 8khz .wav), and hangup. The receiver
>does a session.answer and a recordFile to a .wav
>file so I can go back and check voice quality.
>
>My capacity testing engine is a Perl script
>which is using the RPC XML interface to
>originate the calls on Freeswitch (I submit
>requests to do a 'jsrun play.js' to a certain
>phone number, where play.js is a simple script
>which originates, waitsforanswer, streamfile,
>hangup) . I can configure it to make a certain
>number of originations per second and a certain
>number of total calls. I have no Perl script
>running on the receiver - I just setup the
>dialplan to call a .js which answers the call
>and records it. This testing setup is at an
>early stage, so right now, to check pass/fail, I
>just verify that if I ran a 1000-call test on
>the originator, there should be 1000 .wav files
>that are all about the same size, on the
>receiver at the end of the test, and no crashes.
>
>I've compiled with debug flags on, and I've set
>all *_DEBUG flags to 9 (I have also run the
>tests after a recompile with debug flags off/0,
>and that didn't make any difference). I've done
>all the ulimit commands that were in the last
>few emails on this list. I'm running on FC6 on a
>Dell 2850 with dual 3.6ghz Xeons (/proc/cpuinfo
>shows 4 processors), and 4G RAM. I've set
>max-sessions to 3000 and Session Rate to 100.
>'top' is showing freeswitch at 60-80% on the receiver during this test.
>
>ISSUE:
>My problem right now is on the receiver, which I
>wouldn't care about because I'm most interested
>in the origination capacity of freeswitch, but
>with my receiver crashing so quickly, I can't
>push the originator very hard. I setup my
>originating engine to make 1000 total calls at
>10 call originations per second, each call
>lasting 10 seconds (which results in about 120
>simultaneous channels in use), and I get a seg
>fault on the receiver within about 500 calls or
>50 seconds. If I run a test with 1000 total
>calls at 6 call originations per second, it will
>work, but if I run an overnight test with 20,000
>total calls at 6 call originations per second,
>the receiver will sometimes seg-fault at around
>15,000 calls, and sometimes it will not.
>Interestingly though, if I do a very short but
>very high rate test of 100 total calls at 50
>call originations per second, that will usually
>work. But 500 total calls at 50 calls per second will always seg-fault.
>
>Just so you know... I'm shooting for the holy
>grail of stable operation at 100 call
>originations per second. I know people have
>reported much better results than I'm getting. Is something in my setup bad?
>
>Here's the core dump backtrace. I added some
>debug printf's in session_destroy right before
>the call to destroy_speech_engine, and it looks
>like the jss has been trampled - for example,
>jss->flags is always 0 for all my successful
>calls, but right before it seg-faults,
>jss->flags is some large random number. This happens every single time.
>
>Program terminated with signal 11, Segmentation fault.
>#0 0x00000000 in ?? ()
>(gdb) bt
>#0 0x00000000 in ?? ()
>#1 0x40040437 in switch_core_codec_destroy
>(codec=0x54ece168) at src/switch_core_codec.c:245
>#2 0x40ee778b in destroy_speech_engine
>(jss=0x51206538) at mod_spidermonkey.c:1652
>#3 0x40eeaa70 in session_destroy
>(cx=0x549c9920, obj=0x4eeac7f0) at mod_spidermonkey.c:2723
>#4 0x417d1aa7 in js_FinalizeObject
>(cx=0x549c9920, obj=0x4eeac7f0) at src/jsobj.c:2168
>#5 0x417b04d9 in js_GC (cx=0x549c9920, gcflags=0) at src/jsgc.c:1856
>#6 0x417af6ad in js_ForceGC (cx=0x549c9920, gcflags=0) at src/jsgc.c:1508
>#7 0x417830fd in js_DestroyContext
>(cx=0x549c9920, gcmode=JS_FORCE_GC) at src/jscntxt.c:285
>#8 0x417727ac in JS_DestroyContext (cx=0x549c9920) at src/jsapi.c:956
>#9 0x40eec3c9 in js_parse_and_execute
>(session=0x464d9678, input_code=0x9e31458
>"capacity.js", ro=0x0) at mod_spidermonkey.c:3296
>#10 0x40eec3f2 in js_dp_function
>(session=0x464d9678, data=0x9e31458 "capacity.js") at mod_spidermonkey.c:3302
>#11 0x40044341 in switch_core_session_exec
>(session=0x464d9678,
>application_interface=0x40f26f80, arg=0x9e31458 "capacity.js")
> at src/switch_core_session.c:936
>#12 0x400455be in
>switch_core_standard_on_execute
>(session=0x464d9678) at src/switch_core_state_machine.c:169
>#13 0x40046605 in switch_core_session_run
>(session=0x464d9678) at src/switch_core_state_machine.c:406
>#14 0x4004381c in switch_core_session_thread
>(thread=0x9e31288, obj=0x464d9678) at src/switch_core_session.c:681
>#15 0x4009047c in dummy_worker
>(opaque=0x9e31288) at threadproc/unix/thread.c:138
>#16 0x007cb3db in start_thread () from /lib/libpthread.so.0
>#17 0x0072506e in clone () from /lib/libc.so.6
>
>
>Tom
>
>
>===============
>tuhl at ix.netcom.com
>_______________________________________________
>Freeswitch-users mailing list
>Freeswitch-users at lists.freeswitch.org
>http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
>UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
>http://www.freeswitch.org
===============
tuhl at ix.netcom.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20071129/368c8340/attachment-0002.html
More information about the FreeSWITCH-users
mailing list