<html>

<body>

I removed the recordFile and replaced it with a 10s sleep, and still got

the segfaults. I did see some improvement by going to a 2s sleep instead

of a 10s sleep, but then I'm not really testing what I want to be testing

- because if the calls are only lasting 2s, that results in much fewer

simultaneous channels in use.<br><br>

I will try swapping the roles of the 2 servers. It's not hard. After I do

Brian's suggestion and convert away from JS, I'll clean up my scripts and

upload them too.<br><br>

Tom<br><br>

At 01:32 PM 11/29/2007, Michael Collins wrote:<br>

<blockquote type=cite class=cite cite="">Content-class:

urn:content-classes:message<br>

Content-Type: multipart/alternative;<br>

<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>

boundary=&quot;----_=_NextPart_001_01C832CF.4FDA2698&quot;<br><br>

<font size=2 color="#000080">Tom,<br>

&nbsp;<br>

This sounds very interesting.&nbsp; I’d like to know a few things:<br>

First, if you disable the call recording on the receiving end, will you

still get the segfaults consistently?&nbsp; Just curious to see what the

receiver does if it only answers the calls and then hangs up without the

extra burden of recording the audio streams.<br>

Second, how hard would it be to have the originator and the receiver

trade places?&nbsp; If you could, I’d like to see the two machines switch

roles, so that the machine currently acting as the receiver makes the

calls and the machine acting as the originator will now receive

calls.&nbsp; I’m wondering what will happen – will the segfaults stay at

the same machine or will they go over to the new receiver, or will they

go away altogether…?&nbsp; <br>

&nbsp;<br>

I know those are kinda brute force suggestions but they might yield some

interesting information:<br>

If the segfaults occur only on one machine, regardless of whether it’s

making or receiving calls then obviously there’s something up with that

machine.<br>

If the segfaults always occur at the machine receiving the calls then, of

course, we’ve got a more interesting issue.<br>

&nbsp;<br>

Would you mind putting your setup info, scripts, etc. in to the

pastebin?&nbsp; Maybe others could try to replicate your symptoms and see

what shakes out.<br>

&nbsp;<br>

Thanks for taking the initiative to do this kind of testing.&nbsp; It

will definitely help FS be a better, more stable product.<br>

&nbsp;<br>

-MC<br>

&nbsp;<br>

P.S. – I just saw Brian’s emails on this thread, so be sure to check his

suggestions as well!<br>

&nbsp;<br>

&nbsp;<br>

<hr>

<div align="center"></font></div>

<font face="Tahoma" size=2><b>From:</b>

freeswitch-users-bounces@lists.freeswitch.org

[<a href="mailto:freeswitch-users-bounces@lists.freeswitch.org" eudora="autourl">

mailto:freeswitch-users-bounces@lists.freeswitch.org</a>] <b>On Behalf Of

</b>tuhl@ix.netcom.com<br>

<b>Sent:</b> Thursday, November 29, 2007 12:54 PM<br>

<b>To:</b> freeswitch-users@lists.freeswitch.org<br>

<b>Subject:</b> [Freeswitch-users] Capacity testing, seg fault<br>

</font><font face="Times New Roman, Times">&nbsp;<br>

Hi, <br><br>

I'm running some capacity tests on Freeswitch and can cause seg-faults

fairly quickly (&lt;1 minute) at a 'light' load of 10 call originations

per second. Core dump backtrace is at the bottom, and my debugging shows

what looks like corrupted js_session. I'll open an issue on JIRA. I

wanted to get opinions on whether this is a valid architecture for

testing capacity, and whether I'm making a simple mistake.<br><br>

<b>Environment:<br>

</b>I have the trunk version installed on 2 servers and have one server

(the originator) calling the other (the receiver) using SIP, g711, with a

Gig-E ethernet switch between them. The originating server basically does

a session.originate, waitForAnswer, streamFile (10-second 8khz .wav), and

hangup. The receiver does a session.answer and a recordFile to a .wav

file so I can go back and check voice quality.<br><br>

My capacity testing engine is a Perl script which is using the RPC XML

interface to originate the calls on Freeswitch (I submit requests to do a

'jsrun play.js' to a certain phone number, where play.js is a simple

script which originates, waitsforanswer, streamfile, hangup) . I can

configure it to make a certain number of originations per second and a

certain number of total calls. I have no Perl script running on the

receiver - I just setup the dialplan to call a .js which answers the call

and records it. This testing setup is at an early stage, so right now, to

check pass/fail, I just verify that if I ran a 1000-call test on the

originator, there should be 1000 .wav files that are all about the same

size, on the receiver at the end of the test, and no crashes.<br><br>

I've compiled with debug flags on, and I've set all *_DEBUG flags to 9 (I

have also run the tests after a recompile with debug flags off/0, and

that didn't make any difference). I've done all the ulimit commands that

were in the last few emails on this list. I'm running on FC6 on a Dell

2850 with dual 3.6ghz Xeons (/proc/cpuinfo shows 4 processors), and 4G

RAM. I've set max-sessions to 3000 and Session Rate to 100. 'top' is

showing freeswitch at 60-80% on the receiver during this test.<br><br>

<b>ISSUE: <br>

</b>My problem right now is on the receiver, which I wouldn't care about

because I'm most interested in the origination capacity of freeswitch,

but with my receiver crashing so quickly, I can't push the originator

very hard. I setup my originating engine to make 1000 total calls at 10

call originations per second, each call lasting 10 seconds (which results

in about 120 simultaneous channels in use), and I get a seg fault on the

receiver within about 500 calls or 50 seconds. If I run a test with 1000

total calls at 6 call originations per second, it will work, but if I run

an overnight test with 20,000 total calls at 6 call originations per

second, the receiver will sometimes seg-fault at around 15,000 calls, and

sometimes it will not. Interestingly though, if I do a very short but

very high rate test of 100 total calls at 50 call originations per

second, that will usually work. But 500 total calls at 50 calls per

second will always seg-fault.<br><br>

Just so you know... I'm shooting for the holy grail of stable operation

at 100 call originations per second. I know people have reported much

better results than I'm getting. Is something in my setup bad?<br><br>

Here's the core dump backtrace. I added some debug printf's in

session_destroy right before the call to destroy_speech_engine, and it

looks like the jss has been trampled - for example, jss-&gt;flags is

always 0 for all my successful calls, but right before it seg-faults,

jss-&gt;flags is some large random number. This happens every single

time.<br><br>

Program terminated with signal 11, Segmentation fault.<br>

#0&nbsp; 0x00000000 in ?? ()<br>

(gdb) bt<br>

#0&nbsp; 0x00000000 in ?? ()<br>

#1&nbsp; 0x40040437 in switch_core_codec_destroy (codec=0x54ece168) at

src/switch_core_codec.c:245<br>

#2&nbsp; 0x40ee778b in destroy_speech_engine (jss=0x51206538) at

mod_spidermonkey.c:1652<br>

#3&nbsp; 0x40eeaa70 in session_destroy (cx=0x549c9920, obj=0x4eeac7f0) at

mod_spidermonkey.c:2723<br>

#4&nbsp; 0x417d1aa7 in js_FinalizeObject (cx=0x549c9920, obj=0x4eeac7f0)

at src/jsobj.c:2168<br>

#5&nbsp; 0x417b04d9 in js_GC (cx=0x549c9920, gcflags=0) at

src/jsgc.c:1856<br>

#6&nbsp; 0x417af6ad in js_ForceGC (cx=0x549c9920, gcflags=0) at

src/jsgc.c:1508<br>

#7&nbsp; 0x417830fd in js_DestroyContext (cx=0x549c9920,

gcmode=JS_FORCE_GC) at src/jscntxt.c:285<br>

#8&nbsp; 0x417727ac in JS_DestroyContext (cx=0x549c9920) at

src/jsapi.c:956<br>

#9&nbsp; 0x40eec3c9 in js_parse_and_execute (session=0x464d9678,

input_code=0x9e31458 &quot;capacity.js&quot;, ro=0x0) at

mod_spidermonkey.c:3296<br>

#10 0x40eec3f2 in js_dp_function (session=0x464d9678, data=0x9e31458

&quot;capacity.js&quot;) at mod_spidermonkey.c:3302<br>

#11 0x40044341 in switch_core_session_exec (session=0x464d9678,

application_interface=0x40f26f80, arg=0x9e31458

&quot;capacity.js&quot;)<br>

&nbsp;&nbsp;&nbsp; at src/switch_core_session.c:936<br>

#12 0x400455be in switch_core_standard_on_execute (session=0x464d9678) at

src/switch_core_state_machine.c:169<br>

#13 0x40046605 in switch_core_session_run (session=0x464d9678) at

src/switch_core_state_machine.c:406<br>

#14 0x4004381c in switch_core_session_thread (thread=0x9e31288,

obj=0x464d9678) at src/switch_core_session.c:681<br>

#15 0x4009047c in dummy_worker (opaque=0x9e31288) at

threadproc/unix/thread.c:138<br>

#16 0x007cb3db in start_thread () from /lib/libpthread.so.0<br>

#17 0x0072506e in clone () from /lib/libc.so.6<br><br>

<br>

Tom<br><br>

<br>

===============<br>

tuhl@ix.netcom.com<br>

</font>_______________________________________________<br>

Freeswitch-users mailing list<br>

Freeswitch-users@lists.freeswitch.org<br>

<a href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-users" eudora="autourl">

http://lists.freeswitch.org/mailman/listinfo/freeswitch-users</a><br>

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users<br>

<a href="http://www.freeswitch.org/" eudora="autourl">

http://www.freeswitch.org</a></blockquote>

<x-sigsep><p></x-sigsep>

===============<br>

tuhl@ix.netcom.com</body>

</html>