[Freeswitch-users] Capacity testing, seg fault

Fri Nov 30 06:50:11 PST 2007

Can you let me into your box while you have it so easy to
reproduce so I can examine it and work on it from there?
find me on irc...

Anthony Minessale II

FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/

AIM: anthm
MSN:anthony_minessale at hotmail.com
GTALK/JABBER/PAYPAL:anthony.minessale at gmail.com
IRC: irc.freenode.net #freeswitch

FreeSWITCH Developer Conference
sip:888 at conference.freeswitch.org
iax:guest at conference.freeswitch.org/888
googletalk:conf+888 at conference.freeswitch.org
pstn:213-799-1400

----- Original Message ----
From: "tuhl at ix.netcom.com" <tuhl at ix.netcom.com>
To: freeswitch-users at lists.freeswitch.org
Sent: Thursday, November 29, 2007 5:25:17 PM
Subject: Re: [Freeswitch-users] Capacity testing, seg fault

I removed the recordFile and replaced it with a 10s sleep, and still got
the segfaults. I did see some improvement by going to a 2s sleep instead
of a 10s sleep, but then I'm not really testing what I want to be testing
- because if the calls are only lasting 2s, that results in much fewer
simultaneous channels in use.

I will try swapping the roles of the 2 servers. It's not hard. After I do
Brian's suggestion and convert away from JS, I'll clean up my scripts and
upload them too.

Tom

At 01:32 PM 11/29/2007, Michael Collins wrote:

Content-class:
urn:content-classes:message

Content-Type: multipart/alternative;

boundary="----_=_NextPart_001_01C832CF.4FDA2698"

Tom,

This sounds very interesting.  I’d like to know a few things:

First, if you disable the call recording on the receiving end, will you
still get the segfaults consistently?  Just curious to see what the
receiver does if it only answers the calls and then hangs up without the
extra burden of recording the audio streams.

Second, how hard would it be to have the originator and the receiver
trade places?  If you could, I’d like to see the two machines switch
roles, so that the machine currently acting as the receiver makes the
calls and the machine acting as the originator will now receive
calls.  I’m wondering what will happen – will the segfaults stay at
the same machine or will they go over to the new receiver, or will they
go away altogether…?  

I know those are kinda brute force suggestions but they might yield some
interesting information:

If the segfaults occur only on one machine, regardless of whether it’s
making or receiving calls then obviously there’s something up with that
machine.

If the segfaults always occur at the machine receiving the calls then, of
course, we’ve got a more interesting issue.

Would you mind putting your setup info, scripts, etc. in to the
pastebin?  Maybe others could try to replicate your symptoms and see
what shakes out.

Thanks for taking the initiative to do this kind of testing.  It
will definitely help FS be a better, more stable product.

-MC

P.S. – I just saw Brian’s emails on this thread, so be sure to check his
suggestions as well!

From:
freeswitch-users-bounces at lists.freeswitch.org
[
mailto:freeswitch-users-bounces at lists.freeswitch.org] On Behalf Of
tuhl at ix.netcom.com

Sent: Thursday, November 29, 2007 12:54 PM

To: freeswitch-users at lists.freeswitch.org

Subject: [Freeswitch-users] Capacity testing, seg fault

Hi, 

I'm running some capacity tests on Freeswitch and can cause seg-faults
fairly quickly (<1 minute) at a 'light' load of 10 call originations
per second. Core dump backtrace is at the bottom, and my debugging shows
what looks like corrupted js_session. I'll open an issue on JIRA. I
wanted to get opinions on whether this is a valid architecture for
testing capacity, and whether I'm making a simple mistake.

Environment:

I have the trunk version installed on 2 servers and have one server
(the originator) calling the other (the receiver) using SIP, g711, with a
Gig-E ethernet switch between them. The originating server basically does
a session.originate, waitForAnswer, streamFile (10-second 8khz .wav), and
hangup. The receiver does a session.answer and a recordFile to a .wav
file so I can go back and check voice quality.

My capacity testing engine is a Perl script which is using the RPC XML
interface to originate the calls on Freeswitch (I submit requests to do a
'jsrun play.js' to a certain phone number, where play.js is a simple
script which originates, waitsforanswer, streamfile, hangup) . I can
configure it to make a certain number of originations per second and a
certain number of total calls. I have no Perl script running on the
receiver - I just setup the dialplan to call a .js which answers the call
and records it. This testing setup is at an early stage, so right now, to
check pass/fail, I just verify that if I ran a 1000-call test on the
originator, there should be 1000 .wav files that are all about the same
size, on the receiver at the end of the test, and no crashes.

I've compiled with debug flags on, and I've set all *_DEBUG flags to 9 (I
have also run the tests after a recompile with debug flags off/0, and
that didn't make any difference). I've done all the ulimit commands that
were in the last few emails on this list. I'm running on FC6 on a Dell
2850 with dual 3.6ghz Xeons (/proc/cpuinfo shows 4 processors), and 4G
RAM. I've set max-sessions to 3000 and Session Rate to 100. 'top' is
showing freeswitch at 60-80% on the receiver during this test.

ISSUE: 

My problem right now is on the receiver, which I wouldn't care about
because I'm most interested in the origination capacity of freeswitch,
but with my receiver crashing so quickly, I can't push the originator
very hard. I setup my originating engine to make 1000 total calls at 10
call originations per second, each call lasting 10 seconds (which results
in about 120 simultaneous channels in use), and I get a seg fault on the
receiver within about 500 calls or 50 seconds. If I run a test with 1000
total calls at 6 call originations per second, it will work, but if I run
an overnight test with 20,000 total calls at 6 call originations per
second, the receiver will sometimes seg-fault at around 15,000 calls, and
sometimes it will not. Interestingly though, if I do a very short but
very high rate test of 100 total calls at 50 call originations per
second, that will usually work. But 500 total calls at 50 calls per
second will always seg-fault.

Just so you know... I'm shooting for the holy grail of stable operation
at 100 call originations per second. I know people have reported much
better results than I'm getting. Is something in my setup bad?

Here's the core dump backtrace. I added some debug printf's in
session_destroy right before the call to destroy_speech_engine, and it
looks like the jss has been trampled - for example, jss->flags is
always 0 for all my successful calls, but right before it seg-faults,
jss->flags is some large random number. This happens every single
time.

Program terminated with signal 11, Segmentation fault.

#0  0x00000000 in ?? ()

(gdb) bt

#0  0x00000000 in ?? ()

#1  0x40040437 in switch_core_codec_destroy (codec=0x54ece168) at
src/switch_core_codec.c:245

#2  0x40ee778b in destroy_speech_engine (jss=0x51206538) at
mod_spidermonkey.c:1652

#3  0x40eeaa70 in session_destroy (cx=0x549c9920, obj=0x4eeac7f0) at
mod_spidermonkey.c:2723

#4  0x417d1aa7 in js_FinalizeObject (cx=0x549c9920, obj=0x4eeac7f0)
at src/jsobj.c:2168

#5  0x417b04d9 in js_GC (cx=0x549c9920, gcflags=0) at
src/jsgc.c:1856

#6  0x417af6ad in js_ForceGC (cx=0x549c9920, gcflags=0) at
src/jsgc.c:1508

#7  0x417830fd in js_DestroyContext (cx=0x549c9920,
gcmode=JS_FORCE_GC) at src/jscntxt.c:285

#8  0x417727ac in JS_DestroyContext (cx=0x549c9920) at
src/jsapi.c:956

#9  0x40eec3c9 in js_parse_and_execute (session=0x464d9678,
input_code=0x9e31458 "capacity.js", ro=0x0) at
mod_spidermonkey.c:3296

#10 0x40eec3f2 in js_dp_function (session=0x464d9678, data=0x9e31458
"capacity.js") at mod_spidermonkey.c:3302

#11 0x40044341 in switch_core_session_exec (session=0x464d9678,
application_interface=0x40f26f80, arg=0x9e31458
"capacity.js")

    at src/switch_core_session.c:936

#12 0x400455be in switch_core_standard_on_execute (session=0x464d9678) at
src/switch_core_state_machine.c:169

#13 0x40046605 in switch_core_session_run (session=0x464d9678) at
src/switch_core_state_machine.c:406

#14 0x4004381c in switch_core_session_thread (thread=0x9e31288,
obj=0x464d9678) at src/switch_core_session.c:681

#15 0x4009047c in dummy_worker (opaque=0x9e31288) at
threadproc/unix/thread.c:138

#16 0x007cb3db in start_thread () from /lib/libpthread.so.0

#17 0x0072506e in clone () from /lib/libc.so.6

Tom

===============

tuhl at ix.netcom.com

_______________________________________________

Freeswitch-users mailing list

Freeswitch-users at lists.freeswitch.org

http://lists.freeswitch.org/mailman/listinfo/freeswitch-users

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users

http://www.freeswitch.org

===============

tuhl at ix.netcom.com

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20071130/ec6ca177/attachment-0002.html