<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Follow up on this:<br>
      <br>
      The patch Anthony provided in FS-6006 looks good. After almost a
      week of running queued_events is still 0, and memory usage on the
      machine looks much better. More importantly, audio quality isn't
      degrading like we were seeing when the memory leak existed.<br>
      <br>
      There's still another memory leak regarding Sofia-SIP that seems
      to occur if a 183, 200, 183 sequence occurs on an outbound call,
      but I haven't fully quantified or reproduced it yet. It is far
      more rare, but once I corner it I'll provide either a patch or bug
      report.<br>
      <br>
      - David<br>
      <br>
      <br>
      On 12/10/13 2:15 PM, David Sanders wrote:<br>
    </div>
    <blockquote cite="mid:52A79278.9080800@pinger.com" type="cite">
      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
      <div class="moz-cite-prefix">Thanks for the quick turn around.
        Currently trying it out to see how things look after 24 hours,
        and will report back when that finishes and is analyzed.<br>
        <br>
        Glad to hear you were able to reproduce in a lab setup, that was
        going to be my next step after documenting the issue, but looks
        like you beat me to it.<br>
        <br>
        - David<br>
        <br>
        <br>
        On 12/10/13 10:54 AM, Anthony Minessale wrote:<br>
      </div>
      <blockquote
cite="mid:CAKbxfG-=yQd88iei0+3KqQyuvh7Y3t-fvAu-uG6dCsph3DtoMg@mail.gmail.com"
        type="cite">
        <div dir="ltr">See latest HEAD and patch in FS-6006
          <div><br>
          </div>
          <div>It seems in some conditions the code around state
            transitions that is supposed to parse the signal data makes
            it past there.</div>
          <div>The new 1 line patch adds a parse_all_messages in the
            heart of the state machine where it is executed every time
            the state machine sets a new running state.</div>
          <div><br>
          </div>
          <div>I was able to lab up a bye glare scenario with sipp where
            I could reproduce the queued_events bug in 1 second flat
            with 200 simo calls.</div>
          <div>After this patch I was able to run the same test for
            20000 calls without triggering any delta in the
            queued_events</div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
        </div>
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Mon, Dec 9, 2013 at 9:26 PM, David
            Sanders <span dir="ltr">&lt;<a moz-do-not-send="true"
                href="mailto:dsanders@pinger.com" target="_blank">dsanders@pinger.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">I've dug
              into this more after suggestions from Anthony and others
              in<br>
              IRC, and reproduced against 1.2.stable.<br>
              <br>
              I think there's a couple of leak cases, but I've got one
              large one which<br>
              I'll go into further.<br>
              <br>
              Our main use case is bridging audio, and most of the leaks
              reference leg<br>
              B. After tracking down some more SIP for leaked calls,
              with timestamps,<br>
              I can see that the leak appears to happen if events happen
              nearly<br>
              simultaneously on leg A and leg B. One example would be
              both legs saying<br>
              BYE (bye glare) at the same time. Another example I saw
              was a CANCEL<br>
              coming in on leg A at the same time as a 200 OK on leg B.<br>
              <br>
              My current theory is that any events in the signal data
              queue for leg B<br>
              will get leaked, and in turn leak references to the
              nua_handle_t. I<br>
              can't find any code which 'pumps' the signal data queue on
              session<br>
              destroy. To prevent the leak, I think this needs to
              happen. The events<br>
              don't need to be processed since the session is being
              destroyed, but<br>
              they should be pumped out of the signal data queue,
              free'd, and their<br>
              reference to the handle should be released.<br>
              <br>
              Logically it makes sense that most of the leaked events I
              observed would<br>
              be 100 BYE Trying, since simultaneous BYE is probably the
              most likely<br>
              case for the race condition to occur in.<br>
              <br>
              As per Anthony's suggestion, I added trace to where the
              events are<br>
              queued, and to where they are processed. Grepping the logs
              I gathered<br>
              from a 24 hour test run, I can see discrepancies between
              queued and<br>
              processed, just by event types.<br>
              <br>
              I've created the table below to illustrate the difference
              in events<br>
              queued vs processed that I'm seeing. I'm only using one
              SIP profile, and<br>
              I can see in memory that 'queued_events' on the struct =
              87. The table<br>
              adds up to -85, so there's likely one event type I didn't
              notice in my<br>
              trace.<br>
              <br>
+------------------+--------+----------+-----------+------+<br>
              | event            | status | in count | out count | diff
              |<br>
+------------------+--------+----------+-----------+------+<br>
              | nua_i_active     |  200   | 86717    | 86711     |  -6 |<br>
              | nua_i_invite     |  100   | 77054    | 77054     |  -1
               |<br>
              | nua_i_terminated |  200   | 32588    | 32588     |   0
               |<br>
              | nua_i_terminated |  487   | 29156    | 29155     |  -1 |<br>
              | nua_i_state      |   0    | 45443    | 45443     |   0 |<br>
              | nua_i_state      |  180   | 73921    | 73917     |  -4 |<br>
              | nua_i_state      |  200   | 206908   | 206898    | -10
               |<br>
              | nua_i_state      |  487   | 29156    | 29155     |  -1
               |<br>
              | nua_r_invite     |  180   | 2411     | 2411      |   0
               |<br>
              | nua_r_invite     |  183   | 36423    | 36418     |  -5
               |<br>
              | nua_r_invite     |  200   | 32075    | 32071     |  -4 |<br>
              | nua_r_invite     |  408   | 1543     | 1543      |   0 |<br>
              | nua_r_invite     |  486   | 2479     | 2479      |   0 |<br>
              | nua_r_cancel     |  200   | 9099     | 9099      |   0 |<br>
              | nua_r_bye        |  200   | 32491    | 32491     |   0 |<br>
              | nua_i_bye        |  100   | 52363    | 52310     | -53|<br>
+------------------+--------+----------+-----------+------+<br>
              <span class="HOEnZb"><font color="#888888"><br>
                  - David<br>
                </font></span>
              <div class="HOEnZb">
                <div class="h5"><br>
                  <br>
                  On 12/6/13 12:20 PM, David Sanders wrote:<br>
                  &gt; I've been trying to track down and kill a memory
                  leak which is occurring<br>
                  &gt; in mod_sofia/Sofia-SIP. The leak was first
                  spotted under 1.2.0 and<br>
                  &gt; confirmed to still exist in 1.2.14 (latest when
                  it was tested about a<br>
                  &gt; month ago).<br>
                  &gt;<br>
                  &gt; Two JIRA bugs were submitted as part of this
                  hunt, FS-6005 and FS-6006.<br>
                  &gt; The issue improved on 1.2.0 after the patches for
                  those two bugs, but<br>
                  &gt; showed no difference on 1.2.14.<br>
                  &gt;<br>
                  &gt; The leak only seems to occur under heavy load
                  (i.e. production traffic),<br>
                  &gt; but will produce several leaks within 24 hours of
                  a FS restart. Gory<br>
                  &gt; details follow.<br>
                  &gt;<br>
                  &gt; Things known about the leak:<br>
                  &gt;     * The leaked memory is a nua_handle_t struct,
                  and the associated<br>
                  &gt; structs which are free'd properly if the
                  nua_handle_t doesn't leak<br>
                  &gt;     * The box running FS is a CentOS 6.3 machine,
                  Linux kernel version<br>
                  &gt; 2.6.32-279<br>
                  &gt;     * Leak can be analyzed on an idle FS server
                  using GDB + a modified<br>
                  &gt; version of gdb-heap (an extension for GDB for
                  viewing the heap)<br>
                  &gt;        * Once leaked memory is identified, it can
                  be printed with GDB as<br>
                  &gt; the nua_handle_t struct and data can be gleaned
                  from it<br>
                  &gt;<br>
                  &gt; After the patches mentioned above, 1.2.0 has
                  several handles leaked, all<br>
                  &gt; with a sofia_dispatch_event_t struct in their
                  memory pool, indicating a<br>
                  &gt; 100 BYE Trying event. Since this event was never
                  processed (and free'd)<br>
                  &gt; it never decremented the ref count on the handle.
                  The handle is marked<br>
                  &gt; as destroyed, so the ref count is the only thing
                  keeping it around.<br>
                  &gt;<br>
                  &gt; On 1.2.14, it acted the same prior to FS-6006's
                  patch. After that patch<br>
                  &gt; was applied, the leak mutated. The idle box lists
                  no calls and no<br>
                  &gt; channels, but the leaked handles are not marked
                  as destroyed, have<br>
                  &gt; nh_active_call=1, and have a ref count of 3.<br>
                  &gt;<br>
                  &gt; Prior to FS-6006 it was clear with ref trace
                  enabled for mod_sofia that<br>
                  &gt; the missing unref was due to the 100 BYE Trying
                  never being processed.<br>
                  &gt; I'll have to re-enable the ref trace to see why
                  there are now 3<br>
                  &gt; references left after FS-6006.<br>
                  &gt;<br>
                  &gt; Here's a Valgrind entry for the leaked memory
                  (this is from a 1.2.14<br>
                  &gt; instance):<br>
                  &gt;<br>
                  &gt; ==28083== 22,980 (1,120 direct, 21,860 indirect)
                  bytes in 4 blocks are definitely lost in loss record
                  448 of 453<br>
                  &gt; ==28083==    at 0x4C2677B: calloc
                  (vg_replace_malloc.c:593)<br>
                  &gt; ==28083==    by 0xBA60CC1: su_home_new
                  (su_alloc.c:559)<br>
                  &gt; ==28083==    by 0xBA0D8F5: nh_create_handle
                  (nua_common.c:113)<br>
                  &gt; ==28083==    by 0xBA2AA2A: nua_handle (nua.c:315)<br>
                  &gt; ==28083==    by 0xB9AAD38: sofia_glue_do_invite
                  (sofia_glue.c:2357)<br>
                  &gt; ==28083==    by 0xB9729CC: sofia_on_init
                  (mod_sofia.c:108)<br>
                  &gt; ==28083==    by 0x5138702:
                  switch_core_session_run
                  (switch_core_state_machine.c:424)<br>
                  &gt; ==28083==    by 0x5134D8D:
                  switch_core_session_thread
                  (switch_core_session.c:1417)<br>
                  &gt; ==28083==    by 0x596C850: start_thread (in
                  /lib64/<a moz-do-not-send="true"
                    href="http://libpthread-2.12.so" target="_blank">libpthread-2.12.so</a>)<br>
                  &gt; ==28083==    by 0x6AEB11C: clone (in /lib64/<a
                    moz-do-not-send="true" href="http://libc-2.12.so"
                    target="_blank">libc-2.12.so</a>)<br>
                  &gt;<br>
                  &gt;<br>
                  &gt; I've exhausted my ability to debug the issue, and
                  could use some help<br>
                  &gt; from the FS maintainers/community in killing the
                  issue. The leaked<br>
                  &gt; memory, while small, appears to cause rather
                  large performance problems<br>
                  &gt; for the FS instance if allowed to accumulate.<br>
                  &gt;<br>
                  &gt; - David<br>
                  &gt;<br>
                  &gt; P.S. I'll be on the freeswitch-dev IRC channel if
                  anyone would like to<br>
                  &gt; discuss more details of the leak.<br>
                  &gt;<br>
                  &gt;
_________________________________________________________________________<br>
                  &gt; Professional FreeSWITCH Consulting Services:<br>
                  &gt; <a moz-do-not-send="true"
                    href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
                  &gt; <a moz-do-not-send="true"
                    href="http://www.freeswitchsolutions.com"
                    target="_blank">http://www.freeswitchsolutions.com</a><br>
                  &gt;<br>
                  &gt; FreeSWITCH-powered IP PBX: The CudaTel
                  Communication Server<br>
                  &gt; <a moz-do-not-send="true"
                    href="http://www.cudatel.com" target="_blank">http://www.cudatel.com</a><br>
                  &gt;<br>
                  &gt; Official FreeSWITCH Sites<br>
                  &gt; <a moz-do-not-send="true"
                    href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                  &gt; <a moz-do-not-send="true"
                    href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
                  &gt; <a moz-do-not-send="true"
                    href="http://www.cluecon.com" target="_blank">http://www.cluecon.com</a><br>
                  &gt;<br>
                  &gt; FreeSWITCH-dev mailing list<br>
                  &gt; <a moz-do-not-send="true"
                    href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
                  &gt; <a moz-do-not-send="true"
                    href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
                    target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
                  &gt; UNSUBSCRIBE:<a moz-do-not-send="true"
                    href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
                    target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
                  &gt; <a moz-do-not-send="true"
                    href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                  <br>
                  <br>
_________________________________________________________________________<br>
                  Professional FreeSWITCH Consulting Services:<br>
                  <a moz-do-not-send="true"
                    href="mailto:consulting@freeswitch.org">consulting@freeswitch.org</a><br>
                  <a moz-do-not-send="true"
                    href="http://www.freeswitchsolutions.com"
                    target="_blank">http://www.freeswitchsolutions.com</a><br>
                  <br>
                  FreeSWITCH-powered IP PBX: The CudaTel Communication
                  Server<br>
                  <a moz-do-not-send="true"
                    href="http://www.cudatel.com" target="_blank">http://www.cudatel.com</a><br>
                  <br>
                  Official FreeSWITCH Sites<br>
                  <a moz-do-not-send="true"
                    href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                  <a moz-do-not-send="true"
                    href="http://wiki.freeswitch.org" target="_blank">http://wiki.freeswitch.org</a><br>
                  <a moz-do-not-send="true"
                    href="http://www.cluecon.com" target="_blank">http://www.cluecon.com</a><br>
                  <br>
                  FreeSWITCH-dev mailing list<br>
                  <a moz-do-not-send="true"
                    href="mailto:FreeSWITCH-dev@lists.freeswitch.org">FreeSWITCH-dev@lists.freeswitch.org</a><br>
                  <a moz-do-not-send="true"
                    href="http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev"
                    target="_blank">http://lists.freeswitch.org/mailman/listinfo/freeswitch-dev</a><br>
                  UNSUBSCRIBE:<a moz-do-not-send="true"
                    href="http://lists.freeswitch.org/mailman/options/freeswitch-dev"
                    target="_blank">http://lists.freeswitch.org/mailman/options/freeswitch-dev</a><br>
                  <a moz-do-not-send="true"
                    href="http://www.freeswitch.org" target="_blank">http://www.freeswitch.org</a><br>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
          <br clear="all">
          <div><br>
          </div>
          -- <br>
          <div dir="ltr">Anthony Minessale II<br>
            <br>
            <div>☞ <a moz-do-not-send="true"
                href="http://www.freeswitch.org/">http://www.freeswitch.org/</a>
               ☞ <a moz-do-not-send="true"
                href="http://www.cluecon.com/">http://www.cluecon.com/</a>
               ☞ <a moz-do-not-send="true"
                href="http://twitter.com/FreeSWITCH">http://twitter.com/FreeSWITCH</a></div>
            <div>
              <div>☞ <a moz-do-not-send="true"
                  href="http://irc.freenode.net">irc.freenode.net</a>
                #freeswitch<br>
                <br>
              </div>
              <div>ClueCon Weekly Development Call <br>
              </div>
              <div>☎ <a moz-do-not-send="true"
                  href="mailto:sip%3A888@conference.freeswitch.org">sip:888@conference.freeswitch.org</a>
                 ☎ +19193869900 </div>
              <div><br>
              </div>
            </div>
          </div>
        </div>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>