[Freeswitch-dev] Memory Leak in mod_sofia

David Sanders dsanders at pinger.com
Fri Dec 6 23:20:29 MSK 2013


I've been trying to track down and kill a memory leak which is occurring 
in mod_sofia/Sofia-SIP. The leak was first spotted under 1.2.0 and 
confirmed to still exist in 1.2.14 (latest when it was tested about a 
month ago).

Two JIRA bugs were submitted as part of this hunt, FS-6005 and FS-6006. 
The issue improved on 1.2.0 after the patches for those two bugs, but 
showed no difference on 1.2.14.

The leak only seems to occur under heavy load (i.e. production traffic), 
but will produce several leaks within 24 hours of a FS restart. Gory 
details follow.

Things known about the leak:
   * The leaked memory is a nua_handle_t struct, and the associated 
structs which are free'd properly if the nua_handle_t doesn't leak
   * The box running FS is a CentOS 6.3 machine, Linux kernel version 
2.6.32-279
   * Leak can be analyzed on an idle FS server using GDB + a modified 
version of gdb-heap (an extension for GDB for viewing the heap)
      * Once leaked memory is identified, it can be printed with GDB as 
the nua_handle_t struct and data can be gleaned from it

After the patches mentioned above, 1.2.0 has several handles leaked, all 
with a sofia_dispatch_event_t struct in their memory pool, indicating a 
100 BYE Trying event. Since this event was never processed (and free'd) 
it never decremented the ref count on the handle. The handle is marked 
as destroyed, so the ref count is the only thing keeping it around.

On 1.2.14, it acted the same prior to FS-6006's patch. After that patch 
was applied, the leak mutated. The idle box lists no calls and no 
channels, but the leaked handles are not marked as destroyed, have 
nh_active_call=1, and have a ref count of 3.

Prior to FS-6006 it was clear with ref trace enabled for mod_sofia that 
the missing unref was due to the 100 BYE Trying never being processed. 
I'll have to re-enable the ref trace to see why there are now 3 
references left after FS-6006.

Here's a Valgrind entry for the leaked memory (this is from a 1.2.14 
instance):

==28083== 22,980 (1,120 direct, 21,860 indirect) bytes in 4 blocks are definitely lost in loss record 448 of 453
==28083==    at 0x4C2677B: calloc (vg_replace_malloc.c:593)
==28083==    by 0xBA60CC1: su_home_new (su_alloc.c:559)
==28083==    by 0xBA0D8F5: nh_create_handle (nua_common.c:113)
==28083==    by 0xBA2AA2A: nua_handle (nua.c:315)
==28083==    by 0xB9AAD38: sofia_glue_do_invite (sofia_glue.c:2357)
==28083==    by 0xB9729CC: sofia_on_init (mod_sofia.c:108)
==28083==    by 0x5138702: switch_core_session_run (switch_core_state_machine.c:424)
==28083==    by 0x5134D8D: switch_core_session_thread (switch_core_session.c:1417)
==28083==    by 0x596C850: start_thread (in /lib64/libpthread-2.12.so)
==28083==    by 0x6AEB11C: clone (in /lib64/libc-2.12.so)


I've exhausted my ability to debug the issue, and could use some help 
from the FS maintainers/community in killing the issue. The leaked 
memory, while small, appears to cause rather large performance problems 
for the FS instance if allowed to accumulate.

- David

P.S. I'll be on the freeswitch-dev IRC channel if anyone would like to 
discuss more details of the leak.



Join us at ClueCon 2013 Aug 6-8, 2013
More information about the FreeSWITCH-dev mailing list