[Freeswitch-users] Hung Channels (SVN Rev 10231)

Eric Liedtke e at musinghalfwit.org
Thu Mar 5 15:22:37 PST 2009


Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.

No proxy or bypass media here, just defaults.

I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies. 

Thanks
-e

It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM -0500 , Mathieu Rene said:
> HI,
> 
> If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs 
> .
> This gives the whole team a way of following up on issues.
> 
> Also can you upgrade to svn trunk? A lot of fixes gets committed  
> daily, so its good to stay up to date.
> 
> As you seem familiar with GDB, you may symlink the .gdbinit file in  
> the support-d/ folder to your home directory.
> This will give you some FS-specific macros such as "list_sessions"  
> which will dump a list of uuids to session pointers.
> 
> In your jira, make sure you include "thread apply all bt",  
> "list_sessions" and show channels (this one goes in FS) but PLEASE  
> update to svn trunk and test again to see if it still happens.
> 
> Also, are you using proxy/bypass media or just the default?
> 
> Math
> 
> On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
> 
> > Greetings,
> >
> > I've been using FS in production on this rev (I realize it's pretty  
> > far
> > behind current) and it's been running well, save 1 issue.
> >
> > The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I  
> > have
> > 2 sip profiles created , 1 per ip interface. This is being used to
> > terminate traffic to a provider so calls are only 1 direction. They  
> > come
> > into the private side profile, get routed via dialplan to the gateway
> > defined in the external profile and on to the vendor. Pretty simple.
> >
> > I have noticed that under load (50 or so cps with ~800-900 bridged  
> > calls up)
> > that over time some channels on the public side seem to get  
> > "stuck".  Due to
> > the nature of how this is being used , I would expect both sip  
> > profiles to show
> > the same number of channels in use any time i do a 'sofia  
> > status' ( or at least
> > be within a channel or 2 of each other). However after a day of  
> > heavy use I had
> > a disparity of ~250 channels. These extra channels also seem to put  
> > some
> > continual load on the 'system cpu' as well , reported via top.
> >
> > Of course due to the load on the box I have to keep logging turned way
> > down. So I've been trying to troubleshoot it as best I can.
> >
> > Last night I grabbed a core file and started in with GDB today. I  
> > found
> > the 120 or so threads that represented real active calls when I took  
> > the
> > corefile, I also found ~250 threads that appeared to be stuck in the
> > CS_NEW state. The backtraces on all of them looks the same,  
> > annotated below.
> >
> > I walked through the code path by hand , based on the bt's and I  
> > don't see how
> > this could be happening  unless it's a locking issue. But as far as  
> > I can tell
> > each  session  has it's own mutex defined in the  
> > switch_core_session_t struct,
> > so I wouldn't think they would be stepping on each other. I also  
> > would have expected
> > if it were something of a deadlock nature it would stop processing  
> > calls all
> > together.
> >
> > I grabbed the commands from the .gdbinit (super handy btw!!) and  
> > have been trolling
> > through the variables to try to ascertain something about why these  
> > threads seem to
> > be stuck, but am not having much luck even coming up with a scenario  
> > to try
> > to replicate the issue.
> >
> > If anyone has any pointers as to where I might look next it would be  
> > greatly
> > appreciated.
> >
> > We will be updating to the newest release soon, however I was hoping  
> > to nail down
> > what is going so I can systematically replicate it and verify by  
> > testing in the lab
> > that it is fixed , rather than just pushing the new release to  
> > produvction and hoping.
> >
> > Thanks in advance for any tips/pointers anyone may have.
> >
> > -e
> >
> > ......bt and bt full for a single "hung" thread
> >
> >
> > #0  0xb7fd5410 in __kernel_vsyscall ()
> > #1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> > #2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> > #3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> > #4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/ 
> > switch_core_state_machine.c:462
> > #5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,  
> > obj=0x95fe270) at src/switch_core_session.c:853
> > #6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/ 
> > thread.c:138
> > #7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/ 
> > libpthread.so.0
> > #8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
> > (gdb) bt full
> > #0  0xb7fd5410 in __kernel_vsyscall ()
> > No symbol table info available.
> > #1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
> > No symbol table info available.
> > #2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
> > No symbol table info available.
> > #3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
> > No locals.
> > #4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/ 
> > switch_core_state_machine.c:462
> >        exception = 0 '\0'
> >        state = <value optimized out>
> >        endstate = CS_NEW
> >        endpoint_interface = <value optimized out>
> >        driver_state_handler = (const switch_state_handler_table_t *)  
> > 0xb73b1720
> >        application_state_handler = <value optimized out>
> >        thread_id = 3085554955
> >        env = {{__jmpbuf = {134603552, -1428248680, -1461722504,  
> > 9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,  
> > __saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,  
> > 9184, 1, 2976641592, 2833244792, 3086590960,
> >        168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,  
> > 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,  
> > 134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,  
> > 3085458203, 3086590960, 2976606624,
> >        134564192, 2833244904}}}}
> >        sig = <value optimized out>
> >        __func__ = "switch_core_session_run"
> >        __PRETTY_FUNCTION__ = "switch_core_session_run"
> > #5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,  
> > obj=0x95fe270) at src/switch_core_session.c:853
> >        session = (switch_core_session_t *) 0x95fe270
> >        event = <value optimized out>
> >        event_str = 0x0
> >        val = <value optimized out>
> >        __func__ = "switch_core_session_thread"
> >        __PRETTY_FUNCTION__ = "switch_core_session_thread"
> > #6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/ 
> > thread.c:138
> > No locals.
> > #7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/ 
> > libpthread.so.0
> > No symbol table info available.
> > #8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
> >
> >
> > _______________________________________________
> > Freeswitch-users mailing list
> > Freeswitch-users at lists.freeswitch.org
> > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> > UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> > http://www.freeswitch.org
> 
> 
> _______________________________________________
> Freeswitch-users mailing list
> Freeswitch-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org




More information about the FreeSWITCH-users mailing list