<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Follow up on this:<br>
<br>
The patch Anthony provided in FS-6006 looks good. After almost a
week of running queued_events is still 0, and memory usage on the
machine looks much better. More importantly, audio quality isn't
degrading like we were seeing when the memory leak existed.<br>
<br>
There's still another memory leak regarding Sofia-SIP that seems
to occur if a 183, 200, 183 sequence occurs on an outbound call,
but I haven't fully quantified or reproduced it yet. It is far
more rare, but once I corner it I'll provide either a patch or bug
report.<br>
<br>
- David<br>
<br>
<br>
On 12/10/13 2:15 PM, David Sanders wrote:<br>
</div>
<blockquote cite="mid:52A79278.9080800@pinger.com" type="cite">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<div class="moz-cite-prefix">Thanks for the quick turn around.
Currently trying it out to see how things look after 24 hours,
and will report back when that finishes and is analyzed.<br>
<br>
Glad to hear you were able to reproduce in a lab setup, that was
going to be my next step after documenting the issue, but looks
like you beat me to it.<br>
<br>
- David<br>
<br>
<br>
On 12/10/13 10:54 AM, Anthony Minessale wrote:<br>
</div>
<blockquote
cite="mid:CAKbxfG-=yQd88iei0+3KqQyuvh7Y3t-fvAu-uG6dCsph3DtoMg@mail.gmail.com"
type="cite">
<div dir="ltr">See latest HEAD and patch in FS-6006
<div><br>
</div>
<div>It seems in some conditions the code around state
transitions that is supposed to parse the signal data makes
it past there.</div>
<div>The new 1 line patch adds a parse_all_messages in the
heart of the state machine where it is executed every time
the state machine sets a new running state.</div>
<div><br>
</div>
<div>I was able to lab up a bye glare scenario with sipp where
I could reproduce the queued_events bug in 1 second flat
with 200 simo calls.</div>
<div>After this patch I was able to run the same test for
20000 calls without triggering any delta in the
queued_events</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Dec 9, 2013 at 9:26 PM, David
Sanders <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:dsanders@pinger.com" target="_blank">dsanders@pinger.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">I've dug
into this more after suggestions from Anthony and others
in<br>
IRC, and reproduced against 1.2.stable.<br>
<br>
I think there's a couple of leak cases, but I've got one
large one which<br>
I'll go into further.<br>
<br>
Our main use case is bridging audio, and most of the leaks
reference leg<br>
B. After tracking down some more SIP for leaked calls,
with timestamps,<br>
I can see that the leak appears to happen if events happen
nearly<br>
simultaneously on leg A and leg B. One example would be
both legs saying<br>
BYE (bye glare) at the same time. Another example I saw
was a CANCEL<br>
coming in on leg A at the same time as a 200 OK on leg B.<br>
<br>
My current theory is that any events in the signal data
queue for leg B<br>
will get leaked, and in turn leak references to the
nua_handle_t. I<br>
can't find any code which 'pumps' the signal data queue on
session<br>
destroy. To prevent the leak, I think this needs to
happen. The events<br>
don't need to be processed since the session is being
destroyed, but<br>
they should be pumped out of the signal data queue,
free'd, and their<br>
reference to the handle should be released.<br>
<br>
Logically it makes sense that most of the leaked events I
observed would<br>
be 100 BYE Trying, since simultaneous BYE is probably the
most likely<br>
case for the race condition to occur in.<br>
<br>
As per Anthony's suggestion, I added trace to where the
events are<br>
queued, and to where they are processed. Grepping the logs
I gathered<br>
from a 24 hour test run, I can see discrepancies between
queued and<br>
processed, just by event types.<br>
<br>
I've created the table below to illustrate the difference
in events<br>
queued vs processed that I'm seeing. I'm only using one
SIP profile, and<br>
I can see in memory that 'queued_events' on the struct =
87. The table<br>
adds up to -85, so there's likely one event type I didn't
notice in my<br>
trace.<br>
<br>
+------------------+--------+----------+-----------+------+<br>
| event | status | in count | out count | diff
|<br>
+------------------+--------+----------+-----------+------+<br>
| nua_i_active | 200 | 86717 | 86711 | -6 |<br>
| nua_i_invite | 100 | 77054 | 77054 | -1
|<br>
| nua_i_terminated | 200 | 32588 | 32588 | 0
|<br>
| nua_i_terminated | 487 | 29156 | 29155 | -1 |<br>
| nua_i_state | 0 | 45443 | 45443 | 0 |<br>
| nua_i_state | 180 | 73921 | 73917 | -4 |<br>
| nua_i_state | 200 | 206908 | 206898 | -10
|<br>
| nua_i_state | 487 | 29156 | 29155 | -1
|<br>
| nua_r_invite | 180 | 2411 | 2411 | 0
|<br>
| nua_r_invite | 183 | 36423 | 36418 | -5
|<br>
| nua_r_invite | 200 | 32075 | 32071 | -4 |<br>
| nua_r_invite | 408 | 1543 | 1543 | 0 |<br>
| nua_r_invite | 486 | 2479 | 2479 | 0 |<br>
| nua_r_cancel | 200 | 9099 | 9099 | 0 |<br>
| nua_r_bye | 200 | 32491 | 32491 | 0 |<br>
| nua_i_bye | 100 | 52363 | 52310 | -53|<br>
+------------------+--------+----------+-----------+------+<br>
<span class="HOEnZb"><font color="#888888"><br>
- David<br>
</font></span>
<div class="HOEnZb">
<div class="h5"><br>
<br>
On 12/6/13 12:20 PM, David Sanders wrote:<br>
> I've been trying to track down and kill a memory
leak which is occurring<br>
> in mod_sofia/Sofia-SIP. The leak was first
spotted under 1.2.0 and<br>
> confirmed to still exist in 1.2.14 (latest when
it was tested about a<br>
> month ago).<br>
><br>
> Two JIRA bugs were submitted as part of this
hunt, FS-6005 and FS-6006.<br>
> The issue improved on 1.2.0 after the patches for
those two bugs, but<br>
> showed no difference on 1.2.14.<br>
><br>
> The leak only seems to occur under heavy load
(i.e. production traffic),<br>
> but will produce several leaks within 24 hours of
a FS restart. Gory<br>
> details follow.<br>
><br>
> Things known about the leak:<br>
> * The leaked memory is a nua_handle_t struct,
and the associated<br>
> structs which are free'd properly if the
nua_handle_t doesn't leak<br>
> * The box running FS is a CentOS 6.3 machine,
Linux kernel version<br>
> 2.6.32-279<br>
> * Leak can be analyzed on an idle FS server
using GDB + a modified<br>
> version of gdb-heap (an extension for GDB for
viewing the heap)<br>
> * Once leaked memory is identified, it can
be printed with GDB as<br>
> the nua_handle_t struct and data can be gleaned
from it<br>
><br>
> After the patches mentioned above, 1.2.0 has
several handles leaked, all<br>
> with a sofia_dispatch_event_t struct in their
memory pool, indicating a<br>
> 100 BYE Trying event. Since this event was never
processed (and free'd)<br>
> it never decremented the ref count on the handle.
The handle is marked<br>
> as destroyed, so the ref count is the only thing
keeping it around.<br>
><br>
> On 1.2.14, it acted the same prior to FS-6006's
patch. After that patch<br>
> was applied, the leak mutated. The idle box lists
no calls and no<br>
> channels, but the leaked handles are not marked
as destroyed, have<br>
> nh_active_call=1, and have a ref count of 3.<br>
><br>
> Prior to FS-6006 it was clear with ref trace
enabled for mod_sofia that<br>
> the missing unref was due to the 100 BYE Trying
never being processed.<br>
> I'll have to re-enable the ref trace to see why
there are now 3<br>
> references left after FS-6006.<br>
><br>
> Here's a Valgrind entry for the leaked memory
(this is from a 1.2.14<br>
> instance):<br>
><br>
> ==28083== 22,980 (1,120 direct, 21,860 indirect)
bytes in 4 blocks are definitely lost in loss record
448 of 453<br>
> ==28083== at 0x4C2677B: calloc
(vg_replace_malloc.c:593)<br>
> ==28083== by 0xBA60CC1: su_home_new
(su_alloc.c:559)<br>
> ==28083== by 0xBA0D8F5: nh_create_handle
(nua_common.c:113)<br>
> ==28083== by 0xBA2AA2A: nua_handle (nua.c:315)<br>
> ==28083== by 0xB9AAD38: sofia_glue_do_invite
(sofia_glue.c:2357)<br>
> ==28083== by 0xB9729CC: sofia_on_init
(mod_sofia.c:108)<br>
> ==28083== by 0x5138702:
switch_core_session_run
(switch_core_state_machine.c:424)<br>
> ==28083== by 0x5134D8D:
switch_core_session_thread
(switch_core_session.c:1417)<br>
> ==28083== by 0x596C850: start_thread (in
/lib64/<a moz-do-not-send="true"
href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a>)<br>
> ==28083== by 0x6AEB11C: clone (in /lib64/<a
moz-do-not-send="true" href="http://libc-2.12.so"
target="_blank">libc-2.12.so</a>)<br>
><br>
><br>
> I've exhausted my ability to debug the issue, and
could use some help<br>
> from the FS maintainers/community in killing the
issue. The leaked<br>
> memory, while small, appears to cause rather
large performance problems<br>
> for the FS instance if allowed to accumulate.<br>
><br>
> - David<br>
><br>
> P.S. I'll be on the freeswitch-dev IRC channel if
anyone would like to<br>
> discuss more details of the leak.<br>
><br>
>
_________________________________________________________________________<br>
> Professional FreeSWITCH Consulting Services:<br>
> <a moz-do-not-send="true"
href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
> <a moz-do-not-send="true"
href="http://www.freeswitchsolutions.com"
target="_blank">http://www.freeswitchsolutions.com</a><br>
><br>
> FreeSWITCH-powered IP PBX: The CudaTel
Communication Server<br>
> <a moz-do-not-send="true"
href="http://www.cudatel.com" target="_blank">http://www.cudatel.com</a><br>
><br>
> Official FreeSWITCH Sites<br>
> <a moz-do-not-send="true"
href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
> <a moz-do-not-send="true"
href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
> <a moz-do-not-send="true"
href="http://www.cluecon.com" target="_blank">http://www.cluecon.com</a><br>
><br>
> FreeSWITCH-dev mailing list<br>
> <a moz-do-not-send="true"
href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
> UNSUBSCRIBE:<a moz-do-not-send="true"
href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
> <a moz-do-not-send="true"
href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
<br>
<br>
_________________________________________________________________________<br>
Professional FreeSWITCH Consulting Services:<br>
<a moz-do-not-send="true"
href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
<a moz-do-not-send="true"
href="http://www.freeswitchsolutions.com"
target="_blank">http://www.freeswitchsolutions.com</a><br>
<br>
FreeSWITCH-powered IP PBX: The CudaTel Communication
Server<br>
<a moz-do-not-send="true"
href="http://www.cudatel.com" target="_blank">http://www.cudatel.com</a><br>
<br>
Official FreeSWITCH Sites<br>
<a moz-do-not-send="true"
href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
<a moz-do-not-send="true"
href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
<a moz-do-not-send="true"
href="http://www.cluecon.com" target="_blank">http://www.cluecon.com</a><br>
<br>
FreeSWITCH-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
<a moz-do-not-send="true"
href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
UNSUBSCRIBE:<a moz-do-not-send="true"
href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
<a moz-do-not-send="true"
href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Anthony Minessale II<br>
<br>
<div>☞ <a moz-do-not-send="true"
href="http://www.freeswitch.org/">http://www.freeswitch.org/</a>
☞ <a moz-do-not-send="true"
href="http://www.cluecon.com/">http://www.cluecon.com/</a>
☞ <a moz-do-not-send="true"
href="http://twitter.com/FreeSWITCH">http://twitter.com/FreeSWITCH</a></div>
<div>
<div>☞ <a moz-do-not-send="true"
href="http://irc.freenode.net">irc.freenode.net</a>
#freeswitch<br>
<br>
</div>
<div>ClueCon Weekly Development Call <br>
</div>
<div>☎ <a moz-do-not-send="true"
href="mailto:sip%3A888@conference.freeswitch.org">sip:888@conference.freeswitch.org</a>
☎ +19193869900 </div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>