[Freeswitch-users] Hung Channels (SVN Rev 10231)
Mathieu Rene
mrene_lists at avgs.ca
Thu Mar 5 14:55:33 PST 2009
HI,
If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs
.
This gives the whole team a way of following up on issues.
Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.
As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.
In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.
Also, are you using proxy/bypass media or just the default?
Math
On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
> Greetings,
>
> I've been using FS in production on this rev (I realize it's pretty
> far
> behind current) and it's been running well, save 1 issue.
>
> The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
> have
> 2 sip profiles created , 1 per ip interface. This is being used to
> terminate traffic to a provider so calls are only 1 direction. They
> come
> into the private side profile, get routed via dialplan to the gateway
> defined in the external profile and on to the vendor. Pretty simple.
>
> I have noticed that under load (50 or so cps with ~800-900 bridged
> calls up)
> that over time some channels on the public side seem to get
> "stuck". Due to
> the nature of how this is being used , I would expect both sip
> profiles to show
> the same number of channels in use any time i do a 'sofia
> status' ( or at least
> be within a channel or 2 of each other). However after a day of
> heavy use I had
> a disparity of ~250 channels. These extra channels also seem to put
> some
> continual load on the 'system cpu' as well , reported via top.
>
> Of course due to the load on the box I have to keep logging turned way
> down. So I've been trying to troubleshoot it as best I can.
>
> Last night I grabbed a core file and started in with GDB today. I
> found
> the 120 or so threads that represented real active calls when I took
> the
> corefile, I also found ~250 threads that appeared to be stuck in the
> CS_NEW state. The backtraces on all of them looks the same,
> annotated below.
>
> I walked through the code path by hand , based on the bt's and I
> don't see how
> this could be happening unless it's a locking issue. But as far as
> I can tell
> each session has it's own mutex defined in the
> switch_core_session_t struct,
> so I wouldn't think they would be stepping on each other. I also
> would have expected
> if it were something of a deadlock nature it would stop processing
> calls all
> together.
>
> I grabbed the commands from the .gdbinit (super handy btw!!) and
> have been trolling
> through the variables to try to ascertain something about why these
> threads seem to
> be stuck, but am not having much luck even coming up with a scenario
> to try
> to replicate the issue.
>
> If anyone has any pointers as to where I might look next it would be
> greatly
> appreciated.
>
> We will be updating to the newest release soon, however I was hoping
> to nail down
> what is going so I can systematically replicate it and verify by
> testing in the lab
> that it is fixed , rather than just pushing the new release to
> produvction and hoping.
>
> Thanks in advance for any tips/pointers anyone may have.
>
> -e
>
> ......bt and bt full for a single "hung" thread
>
>
> #0 0xb7fd5410 in __kernel_vsyscall ()
> #1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> #2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> #3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> #4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
> switch_core_state_machine.c:462
> #5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
> obj=0x95fe270) at src/switch_core_session.c:853
> #6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
> thread.c:138
> #7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
> libpthread.so.0
> #8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
> (gdb) bt full
> #0 0xb7fd5410 in __kernel_vsyscall ()
> No symbol table info available.
> #1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> No symbol table info available.
> #2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> No symbol table info available.
> #3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> No locals.
> #4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
> switch_core_state_machine.c:462
> exception = 0 '\0'
> state = <value optimized out>
> endstate = CS_NEW
> endpoint_interface = <value optimized out>
> driver_state_handler = (const switch_state_handler_table_t *)
> 0xb73b1720
> application_state_handler = <value optimized out>
> thread_id = 3085554955
> env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
> 9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
> __saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
> 9184, 1, 2976641592, 2833244792, 3086590960,
> 168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
> 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
> 134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
> 3085458203, 3086590960, 2976606624,
> 134564192, 2833244904}}}}
> sig = <value optimized out>
> __func__ = "switch_core_session_run"
> __PRETTY_FUNCTION__ = "switch_core_session_run"
> #5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
> obj=0x95fe270) at src/switch_core_session.c:853
> session = (switch_core_session_t *) 0x95fe270
> event = <value optimized out>
> event_str = 0x0
> val = <value optimized out>
> __func__ = "switch_core_session_thread"
> __PRETTY_FUNCTION__ = "switch_core_session_thread"
> #6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
> thread.c:138
> No locals.
> #7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
> libpthread.so.0
> No symbol table info available.
> #8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
>
>
> _______________________________________________
> Freeswitch-users mailing list
> Freeswitch-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
More information about the FreeSWITCH-users
mailing list