<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Thanks for the quick turn around.
      Currently trying it out to see how things look after 24 hours, and
      will report back when that finishes and is analyzed.<br>
      <br>
      Glad to hear you were able to reproduce in a lab setup, that was
      going to be my next step after documenting the issue, but looks
      like you beat me to it.<br>
      <br>
      - David<br>
      <br>
      <br>
      On 12/10/13 10:54 AM, Anthony Minessale wrote:<br>
    </div>
    <blockquote
cite="mid:CAKbxfG-=yQd88iei0+3KqQyuvh7Y3t-fvAu-uG6dCsph3DtoMg@mail.gmail.com"
      type="cite">
      <div dir="ltr">See latest HEAD and patch in FS-6006
        <div><br>
        </div>
        <div>It seems in some conditions the code around state
          transitions that is supposed to parse the signal data makes it
          past there.</div>
        <div>The new 1 line patch adds a parse_all_messages in the heart
          of the state machine where it is executed every time the state
          machine sets a new running state.</div>
        <div><br>
        </div>
        <div>I was able to lab up a bye glare scenario with sipp where I
          could reproduce the queued_events bug in 1 second flat with
          200 simo calls.</div>
        <div>After this patch I was able to run the same test for 20000
          calls without triggering any delta in the queued_events</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Mon, Dec 9, 2013 at 9:26 PM, David
          Sanders <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:dsanders@pinger.com" target="_blank">dsanders@pinger.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">I've dug
            into this more after suggestions from Anthony and others in<br>
            IRC, and reproduced against 1.2.stable.<br>
            <br>
            I think there's a couple of leak cases, but I've got one
            large one which<br>
            I'll go into further.<br>
            <br>
            Our main use case is bridging audio, and most of the leaks
            reference leg<br>
            B. After tracking down some more SIP for leaked calls, with
            timestamps,<br>
            I can see that the leak appears to happen if events happen
            nearly<br>
            simultaneously on leg A and leg B. One example would be both
            legs saying<br>
            BYE (bye glare) at the same time. Another example I saw was
            a CANCEL<br>
            coming in on leg A at the same time as a 200 OK on leg B.<br>
            <br>
            My current theory is that any events in the signal data
            queue for leg B<br>
            will get leaked, and in turn leak references to the
            nua_handle_t. I<br>
            can't find any code which 'pumps' the signal data queue on
            session<br>
            destroy. To prevent the leak, I think this needs to happen.
            The events<br>
            don't need to be processed since the session is being
            destroyed, but<br>
            they should be pumped out of the signal data queue, free'd,
            and their<br>
            reference to the handle should be released.<br>
            <br>
            Logically it makes sense that most of the leaked events I
            observed would<br>
            be 100 BYE Trying, since simultaneous BYE is probably the
            most likely<br>
            case for the race condition to occur in.<br>
            <br>
            As per Anthony's suggestion, I added trace to where the
            events are<br>
            queued, and to where they are processed. Grepping the logs I
            gathered<br>
            from a 24 hour test run, I can see discrepancies between
            queued and<br>
            processed, just by event types.<br>
            <br>
            I've created the table below to illustrate the difference in
            events<br>
            queued vs processed that I'm seeing. I'm only using one SIP
            profile, and<br>
            I can see in memory that 'queued_events' on the struct = 87.
            The table<br>
            adds up to -85, so there's likely one event type I didn't
            notice in my<br>
            trace.<br>
            <br>
            +------------------+--------+----------+-----------+------+<br>
            | event            | status | in count | out count | diff |<br>
            +------------------+--------+----------+-----------+------+<br>
            | nua_i_active     |  200   | 86717    | 86711     |  -6 |<br>
            | nua_i_invite     |  100   | 77054    | 77054     |  -1  |<br>
            | nua_i_terminated |  200   | 32588    | 32588     |   0  |<br>
            | nua_i_terminated |  487   | 29156    | 29155     |  -1 |<br>
            | nua_i_state      |   0    | 45443    | 45443     |   0 |<br>
            | nua_i_state      |  180   | 73921    | 73917     |  -4 |<br>
            | nua_i_state      |  200   | 206908   | 206898    | -10  |<br>
            | nua_i_state      |  487   | 29156    | 29155     |  -1  |<br>
            | nua_r_invite     |  180   | 2411     | 2411      |   0  |<br>
            | nua_r_invite     |  183   | 36423    | 36418     |  -5  |<br>
            | nua_r_invite     |  200   | 32075    | 32071     |  -4 |<br>
            | nua_r_invite     |  408   | 1543     | 1543      |   0 |<br>
            | nua_r_invite     |  486   | 2479     | 2479      |   0 |<br>
            | nua_r_cancel     |  200   | 9099     | 9099      |   0 |<br>
            | nua_r_bye        |  200   | 32491    | 32491     |   0 |<br>
            | nua_i_bye        |  100   | 52363    | 52310     | -53|<br>
            +------------------+--------+----------+-----------+------+<br>
            <span class="HOEnZb"><font color="#888888"><br>
                - David<br>
              </font></span>
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                On 12/6/13 12:20 PM, David Sanders wrote:<br>
                &gt; I've been trying to track down and kill a memory
                leak which is occurring<br>
                &gt; in mod_sofia/Sofia-SIP. The leak was first spotted
                under 1.2.0 and<br>
                &gt; confirmed to still exist in 1.2.14 (latest when it
                was tested about a<br>
                &gt; month ago).<br>
                &gt;<br>
                &gt; Two JIRA bugs were submitted as part of this hunt,
                FS-6005 and FS-6006.<br>
                &gt; The issue improved on 1.2.0 after the patches for
                those two bugs, but<br>
                &gt; showed no difference on 1.2.14.<br>
                &gt;<br>
                &gt; The leak only seems to occur under heavy load (i.e.
                production traffic),<br>
                &gt; but will produce several leaks within 24 hours of a
                FS restart. Gory<br>
                &gt; details follow.<br>
                &gt;<br>
                &gt; Things known about the leak:<br>
                &gt;     * The leaked memory is a nua_handle_t struct,
                and the associated<br>
                &gt; structs which are free'd properly if the
                nua_handle_t doesn't leak<br>
                &gt;     * The box running FS is a CentOS 6.3 machine,
                Linux kernel version<br>
                &gt; 2.6.32-279<br>
                &gt;     * Leak can be analyzed on an idle FS server
                using GDB + a modified<br>
                &gt; version of gdb-heap (an extension for GDB for
                viewing the heap)<br>
                &gt;        * Once leaked memory is identified, it can
                be printed with GDB as<br>
                &gt; the nua_handle_t struct and data can be gleaned
                from it<br>
                &gt;<br>
                &gt; After the patches mentioned above, 1.2.0 has
                several handles leaked, all<br>
                &gt; with a sofia_dispatch_event_t struct in their
                memory pool, indicating a<br>
                &gt; 100 BYE Trying event. Since this event was never
                processed (and free'd)<br>
                &gt; it never decremented the ref count on the handle.
                The handle is marked<br>
                &gt; as destroyed, so the ref count is the only thing
                keeping it around.<br>
                &gt;<br>
                &gt; On 1.2.14, it acted the same prior to FS-6006's
                patch. After that patch<br>
                &gt; was applied, the leak mutated. The idle box lists
                no calls and no<br>
                &gt; channels, but the leaked handles are not marked as
                destroyed, have<br>
                &gt; nh_active_call=1, and have a ref count of 3.<br>
                &gt;<br>
                &gt; Prior to FS-6006 it was clear with ref trace
                enabled for mod_sofia that<br>
                &gt; the missing unref was due to the 100 BYE Trying
                never being processed.<br>
                &gt; I'll have to re-enable the ref trace to see why
                there are now 3<br>
                &gt; references left after FS-6006.<br>
                &gt;<br>
                &gt; Here's a Valgrind entry for the leaked memory (this
                is from a 1.2.14<br>
                &gt; instance):<br>
                &gt;<br>
                &gt; ==28083== 22,980 (1,120 direct, 21,860 indirect)
                bytes in 4 blocks are definitely lost in loss record 448
                of 453<br>
                &gt; ==28083==    at 0x4C2677B: calloc
                (vg_replace_malloc.c:593)<br>
                &gt; ==28083==    by 0xBA60CC1: su_home_new
                (su_alloc.c:559)<br>
                &gt; ==28083==    by 0xBA0D8F5: nh_create_handle
                (nua_common.c:113)<br>
                &gt; ==28083==    by 0xBA2AA2A: nua_handle (nua.c:315)<br>
                &gt; ==28083==    by 0xB9AAD38: sofia_glue_do_invite
                (sofia_glue.c:2357)<br>
                &gt; ==28083==    by 0xB9729CC: sofia_on_init
                (mod_sofia.c:108)<br>
                &gt; ==28083==    by 0x5138702: switch_core_session_run
                (switch_core_state_machine.c:424)<br>
                &gt; ==28083==    by 0x5134D8D:
                switch_core_session_thread (switch_core_session.c:1417)<br>
                &gt; ==28083==    by 0x596C850: start_thread (in /lib64/<a
                  moz-do-not-send="true"
                  href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a>)<br>
                &gt; ==28083==    by 0x6AEB11C: clone (in /lib64/<a
                  moz-do-not-send="true" href="http://libc-2.12.so"
                  target="_blank">libc-2.12.so</a>)<br>
                &gt;<br>
                &gt;<br>
                &gt; I've exhausted my ability to debug the issue, and
                could use some help<br>
                &gt; from the FS maintainers/community in killing the
                issue. The leaked<br>
                &gt; memory, while small, appears to cause rather large
                performance problems<br>
                &gt; for the FS instance if allowed to accumulate.<br>
                &gt;<br>
                &gt; - David<br>
                &gt;<br>
                &gt; P.S. I'll be on the freeswitch-dev IRC channel if
                anyone would like to<br>
                &gt; discuss more details of the leak.<br>
                &gt;<br>
                &gt;
_________________________________________________________________________<br>
                &gt; Professional FreeSWITCH Consulting Services:<br>
                &gt; <a moz-do-not-send="true"
                  href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
                &gt; <a moz-do-not-send="true"
                  href="http://www.freeswitchsolutions.com"
                  target="_blank">http://www.freeswitchsolutions.com</a><br>
                &gt;<br>
                &gt; FreeSWITCH-powered IP PBX: The CudaTel
                Communication Server<br>
                &gt; <a moz-do-not-send="true"
                  href="http://www.cudatel.com" target="_blank">http://www.cudatel.com</a><br>
                &gt;<br>
                &gt; Official FreeSWITCH Sites<br>
                &gt; <a moz-do-not-send="true"
                  href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                &gt; <a moz-do-not-send="true"
                  href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
                &gt; <a moz-do-not-send="true"
                  href="http://www.cluecon.com" target="_blank">http://www.cluecon.com</a><br>
                &gt;<br>
                &gt; FreeSWITCH-dev mailing list<br>
                &gt; <a moz-do-not-send="true"
                  href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
                &gt; <a moz-do-not-send="true"
                  href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
                  target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
                &gt; UNSUBSCRIBE:<a moz-do-not-send="true"
                  href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
                  target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
                &gt; <a moz-do-not-send="true"
                  href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                <br>
                <br>
_________________________________________________________________________<br>
                Professional FreeSWITCH Consulting Services:<br>
                <a moz-do-not-send="true"
                  href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
                <a moz-do-not-send="true"
                  href="http://www.freeswitchsolutions.com"
                  target="_blank">http://www.freeswitchsolutions.com</a><br>
                <br>
                FreeSWITCH-powered IP PBX: The CudaTel Communication
                Server<br>
                <a moz-do-not-send="true" href="http://www.cudatel.com"
                  target="_blank">http://www.cudatel.com</a><br>
                <br>
                Official FreeSWITCH Sites<br>
                <a moz-do-not-send="true"
                  href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                <a moz-do-not-send="true"
                  href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
                <a moz-do-not-send="true" href="http://www.cluecon.com"
                  target="_blank">http://www.cluecon.com</a><br>
                <br>
                FreeSWITCH-dev mailing list<br>
                <a moz-do-not-send="true"
                  href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
                <a moz-do-not-send="true"
                  href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
                  target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
                UNSUBSCRIBE:<a moz-do-not-send="true"
                  href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
                  target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
                <a moz-do-not-send="true"
                  href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr">Anthony Minessale II<br>
          <br>
          <div>☞ <a moz-do-not-send="true"
              href="http://www.freeswitch.org/">http://www.freeswitch.org/</a>
             ☞ <a moz-do-not-send="true" href="http://www.cluecon.com/">http://www.cluecon.com/</a>
             ☞ <a moz-do-not-send="true"
              href="http://twitter.com/FreeSWITCH">http://twitter.com/FreeSWITCH</a></div>
          <div>
            <div>☞ <a moz-do-not-send="true"
                href="http://irc.freenode.net">irc.freenode.net</a>
              #freeswitch<br>
              <br>
            </div>
            <div>ClueCon Weekly Development Call <br>
            </div>
            <div>☎ <a moz-do-not-send="true"
                href="mailto:sip%3A888@conference.freeswitch.org">sip:888@conference.freeswitch.org</a>
               ☎ +19193869900 </div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>