[Freeswitch-dev] Memory Leak in mod_sofia
David Sanders
dsanders at pinger.com
Fri Dec 6 23:20:29 MSK 2013
I've been trying to track down and kill a memory leak which is occurring
in mod_sofia/Sofia-SIP. The leak was first spotted under 1.2.0 and
confirmed to still exist in 1.2.14 (latest when it was tested about a
month ago).
Two JIRA bugs were submitted as part of this hunt, FS-6005 and FS-6006.
The issue improved on 1.2.0 after the patches for those two bugs, but
showed no difference on 1.2.14.
The leak only seems to occur under heavy load (i.e. production traffic),
but will produce several leaks within 24 hours of a FS restart. Gory
details follow.
Things known about the leak:
* The leaked memory is a nua_handle_t struct, and the associated
structs which are free'd properly if the nua_handle_t doesn't leak
* The box running FS is a CentOS 6.3 machine, Linux kernel version
2.6.32-279
* Leak can be analyzed on an idle FS server using GDB + a modified
version of gdb-heap (an extension for GDB for viewing the heap)
* Once leaked memory is identified, it can be printed with GDB as
the nua_handle_t struct and data can be gleaned from it
After the patches mentioned above, 1.2.0 has several handles leaked, all
with a sofia_dispatch_event_t struct in their memory pool, indicating a
100 BYE Trying event. Since this event was never processed (and free'd)
it never decremented the ref count on the handle. The handle is marked
as destroyed, so the ref count is the only thing keeping it around.
On 1.2.14, it acted the same prior to FS-6006's patch. After that patch
was applied, the leak mutated. The idle box lists no calls and no
channels, but the leaked handles are not marked as destroyed, have
nh_active_call=1, and have a ref count of 3.
Prior to FS-6006 it was clear with ref trace enabled for mod_sofia that
the missing unref was due to the 100 BYE Trying never being processed.
I'll have to re-enable the ref trace to see why there are now 3
references left after FS-6006.
Here's a Valgrind entry for the leaked memory (this is from a 1.2.14
instance):
==28083== 22,980 (1,120 direct, 21,860 indirect) bytes in 4 blocks are definitely lost in loss record 448 of 453
==28083== at 0x4C2677B: calloc (vg_replace_malloc.c:593)
==28083== by 0xBA60CC1: su_home_new (su_alloc.c:559)
==28083== by 0xBA0D8F5: nh_create_handle (nua_common.c:113)
==28083== by 0xBA2AA2A: nua_handle (nua.c:315)
==28083== by 0xB9AAD38: sofia_glue_do_invite (sofia_glue.c:2357)
==28083== by 0xB9729CC: sofia_on_init (mod_sofia.c:108)
==28083== by 0x5138702: switch_core_session_run (switch_core_state_machine.c:424)
==28083== by 0x5134D8D: switch_core_session_thread (switch_core_session.c:1417)
==28083== by 0x596C850: start_thread (in /lib64/libpthread-2.12.so)
==28083== by 0x6AEB11C: clone (in /lib64/libc-2.12.so)
I've exhausted my ability to debug the issue, and could use some help
from the FS maintainers/community in killing the issue. The leaked
memory, while small, appears to cause rather large performance problems
for the FS instance if allowed to accumulate.
- David
P.S. I'll be on the freeswitch-dev IRC channel if anyone would like to
discuss more details of the leak.
Join us at ClueCon 2013 Aug 6-8, 2013
More information about the FreeSWITCH-dev
mailing list