[Freeswitch-users] Hung Channels (SVN Rev 10231)

Mathieu Rene mrene_lists at avgs.ca
Thu Mar 5 14:55:33 PST 2009


HI,

If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs 
.
This gives the whole team a way of following up on issues.

Also can you upgrade to svn trunk? A lot of fixes gets committed  
daily, so its good to stay up to date.

As you seem familiar with GDB, you may symlink the .gdbinit file in  
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"  
which will dump a list of uuids to session pointers.

In your jira, make sure you include "thread apply all bt",  
"list_sessions" and show channels (this one goes in FS) but PLEASE  
update to svn trunk and test again to see if it still happens.

Also, are you using proxy/bypass media or just the default?

Math

On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:

> Greetings,
>
> I've been using FS in production on this rev (I realize it's pretty  
> far
> behind current) and it's been running well, save 1 issue.
>
> The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I  
> have
> 2 sip profiles created , 1 per ip interface. This is being used to
> terminate traffic to a provider so calls are only 1 direction. They  
> come
> into the private side profile, get routed via dialplan to the gateway
> defined in the external profile and on to the vendor. Pretty simple.
>
> I have noticed that under load (50 or so cps with ~800-900 bridged  
> calls up)
> that over time some channels on the public side seem to get  
> "stuck".  Due to
> the nature of how this is being used , I would expect both sip  
> profiles to show
> the same number of channels in use any time i do a 'sofia  
> status' ( or at least
> be within a channel or 2 of each other). However after a day of  
> heavy use I had
> a disparity of ~250 channels. These extra channels also seem to put  
> some
> continual load on the 'system cpu' as well , reported via top.
>
> Of course due to the load on the box I have to keep logging turned way
> down. So I've been trying to troubleshoot it as best I can.
>
> Last night I grabbed a core file and started in with GDB today. I  
> found
> the 120 or so threads that represented real active calls when I took  
> the
> corefile, I also found ~250 threads that appeared to be stuck in the
> CS_NEW state. The backtraces on all of them looks the same,  
> annotated below.
>
> I walked through the code path by hand , based on the bt's and I  
> don't see how
> this could be happening  unless it's a locking issue. But as far as  
> I can tell
> each  session  has it's own mutex defined in the  
> switch_core_session_t struct,
> so I wouldn't think they would be stepping on each other. I also  
> would have expected
> if it were something of a deadlock nature it would stop processing  
> calls all
> together.
>
> I grabbed the commands from the .gdbinit (super handy btw!!) and  
> have been trolling
> through the variables to try to ascertain something about why these  
> threads seem to
> be stuck, but am not having much luck even coming up with a scenario  
> to try
> to replicate the issue.
>
> If anyone has any pointers as to where I might look next it would be  
> greatly
> appreciated.
>
> We will be updating to the newest release soon, however I was hoping  
> to nail down
> what is going so I can systematically replicate it and verify by  
> testing in the lab
> that it is fixed , rather than just pushing the new release to  
> produvction and hoping.
>
> Thanks in advance for any tips/pointers anyone may have.
>
> -e
>
> ......bt and bt full for a single "hung" thread
>
>
> #0  0xb7fd5410 in __kernel_vsyscall ()
> #1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> #2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> #3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> #4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/ 
> switch_core_state_machine.c:462
> #5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,  
> obj=0x95fe270) at src/switch_core_session.c:853
> #6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/ 
> thread.c:138
> #7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
> (gdb) bt full
> #0  0xb7fd5410 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> No symbol table info available.
> #2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> No symbol table info available.
> #3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> No locals.
> #4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/ 
> switch_core_state_machine.c:462
>        exception = 0 '\0'
>        state = <value optimized out>
>        endstate = CS_NEW
>        endpoint_interface = <value optimized out>
>        driver_state_handler = (const switch_state_handler_table_t *)  
> 0xb73b1720
>        application_state_handler = <value optimized out>
>        thread_id = 3085554955
>        env = {{__jmpbuf = {134603552, -1428248680, -1461722504,  
> 9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,  
> __saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,  
> 9184, 1, 2976641592, 2833244792, 3086590960,
>        168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,  
> 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,  
> 134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,  
> 3085458203, 3086590960, 2976606624,
>        134564192, 2833244904}}}}
>        sig = <value optimized out>
>        __func__ = "switch_core_session_run"
>        __PRETTY_FUNCTION__ = "switch_core_session_run"
> #5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,  
> obj=0x95fe270) at src/switch_core_session.c:853
>        session = (switch_core_session_t *) 0x95fe270
>        event = <value optimized out>
>        event_str = 0x0
>        val = <value optimized out>
>        __func__ = "switch_core_session_thread"
>        __PRETTY_FUNCTION__ = "switch_core_session_thread"
> #6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/ 
> thread.c:138
> No locals.
> #7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> No symbol table info available.
> #8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
>
>
> _______________________________________________
> Freeswitch-users mailing list
> Freeswitch-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org





More information about the FreeSWITCH-users mailing list