[Freeswitch-users] registration fails after several hours - FS problem?
pbdlists at pinboard.com
pbdlists at pinboard.com
Tue Nov 2 17:01:33 PDT 2010
Hello Anthony,
Indeed, even though I can't understand most of what is going on in this
debug output, it is helpful. Here what I found (I still have the traces,
but can't put them anywhere public):
- just before the registrations go away, freeswitch says it got an ICMP
type 3 code 1 (no route to host):
ESC[mESC[mtport_wakeup_pri(0x7f9030004520): events ERR
ESC[mESC[mtport_udp_error: No route to host (113) [icmp type=3 code=1]
ESC[mESC[m reported by [188..........9]:0
ESC[mESC[mnta_agent: tport: 88........1:1024: No route to host
- a tcpdump, however, shows no such ICMP packet
- routes are static and a dump of the routing table every 5 seconds
shows that the default route (used for these destinations) is there
- some more testing, capturing and searching shows a very interesting
behaviour:
- none of the network interfaces used to communicate to anywhere
outside of the box do show the ICMP reported by freeswitch, neither
the external facing interface, nor the internal facing interface
- BUT:
- on the loopback interface I do get ICMP type 3 code 1 (host
unreachable) messages
- the ICMP messages I see there are only for systems which which
freeswitch is communicating
- the ICMP messages I see there are exactly for the remote systems
which were reported down, plus for one internal registrations)
- the timestamps of the ICMP messages starting and the registrations
going down match, as well as the timestamps of the ICMP messages
stopping and the registrations coming up again
I didn't change much on the default config and as far as I know nothing
network related. Is it possible that I nevertheless messed up my config
somewhere, causing freeswitch to chose the loopback interface for
communicating from time to time?
Cheers,
Kurt
On Mon, Nov 01, 2010 at 06:59:27PM -0500, Anthony Minessale wrote:
> what would help is if you can get a similar log with the siptrace on
> the profile and sofia global loglevel 9
> The key is the catch the very first time it goes wrong, possibly a
> full pcap of any network activity as well to look for more clues.
>
> This appears to be some sort of strange environmental condition or
> particular edge case that breaks the sip lib internally.
>
>
>
> On Mon, Nov 1, 2010 at 5:51 PM, <pbdlists at pinboard.com> wrote:
> > Just a quick note. Mario mentions he only sees the problems on osX. I see the
> > exactly same errors and warning in my logs on a Linux box (Fedora 12 64-bit).
> > Sometimes it happens every couple of minutes, sometimes it goes away for 2-3
> > hours.
> >
> > The excerpt attached is from a log of a freshly compiled git checkout. What I see
> > is that if it happens, usually multiple external registrations go down, not just
> > one or just the registrations with one server/provider.
> >
> > Cheers,
> >
> > Kurt
> >
> > On Sun, Oct 31, 2010 at 12:24:35PM -0700, Mario G wrote:
> >> I have the pcap and dump to email to you and lot's of new info on this serious bug (yes it's a bug on FS for osX). The pcap is 1.1M and dump is 350M, please tell me where to send them. I don't want to put then in public areas since they contain security info. Please review my steps below. I don't know FS or Linux internals but it seems a lot like a timing issue where two processes are not communicating with each other since retry messages occur but there is no SIP tracing going on. THANKS SO MUCH!
> >>
> >> LINUX
> >> 1. Setup FS on OpenSuse starting Sep 15. After basic initial problems there was a serious nat/upnp problems that lasted 3 weeks. Fixed with help, but still used nat.
> >> 2. Final testing was on git 2010-10-13. Ran fine for 5 days on very old 32 bit system.
> >>
> >> OSX
> >> 3. Purchased Mac Mini and installed FS git 2010-10-23. Lasted only 3 to 17 hours. Problems looked same as nat so switched to full static.
> >> 4. With all static (-nonat) and only one DSL static connection active ITSPs go down in 5-60 minutes one by one. Still thought it was network related. Sent you traces.
> >> 5. Updated to git 10-29 but made no difference.
> >>
> >> LINUX
> >> 6. Went back to the Linux box with git 10-13 using copy of config from mac. Pure static as osX. No problems for 6 hours!
> >> 7. Copied and updated Linux to git 10-29 to be the same as Mac box. Again, no problems for 12 hours!
> >>
> >> OSX
> >> 8. Went back to the mac to provide you with pcap and dump. In about 15 minutes FS lost 2 ITSPs. Here are messages issues during pcap/dump, NOTE clock message which is first I have seen of it:
> >>
> >> 2010-10-31 11:35:00.593970 [WARNING] sofia_reg.c:387 idone Failed Registration, setting retry to 15 seconds.
> >> 2010-10-31 11:35:13.118634 [NOTICE] sofia_reg.c:342 Registering idtwo
> >> 2010-10-31 11:35:16.432236 [NOTICE] sofia_reg.c:342 Registering idone
> >> 2010-10-31 11:35:19.898319 [CRIT] switch_time.c:760 Forward Clock Skew Detected!
> >> 2010-10-31 11:35:25.440207 [WARNING] switch_scheduler.c:114 Task was executed late by 2 seconds 1 heartbeat (core)
> >> 2010-10-31 11:35:29.946329 [WARNING] sofia_reg.c:387 idtwo Failed Registration, setting retry to 15 seconds.
> >> 2010-10-31 11:35:32.147466 [WARNING] sofia_reg.c:387 idone Failed Registration, setting retry to 15 seconds.
> >>
> >> I found the instruction for PCAP and TCPDUMP here in case you need them:
> >> http://support.apple.com/kb/HT3994
> >> http://www.osxbook.com/book/bonus/chapter8/core/
> >>
> >> Note: I had the Mini set to no sleep even though it worked with Linux sleep. I found a couple others on the web who had the same problem and one had written a script to restart FS every 4 hours. Fried (tired) right now and cant find the URL but it was from Jan 2010.
> >>
> >> One last thing to mention is that on osX using auto-nat:1.2.3.4 and some expiry parms, etc that may have triggered activity, FS worked much longer than on static. This is why I think it's timer or sync related and only on osX.
> >>
> >
> > _______________________________________________
> > FreeSWITCH-users mailing list
> > FreeSWITCH-users at lists.freeswitch.org
> > http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> > UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> > http://www.freeswitch.org
> >
> >
>
>
>
> --
> Anthony Minessale II
>
> FreeSWITCH http://www.freeswitch.org/
> ClueCon http://www.cluecon.com/
> Twitter: http://twitter.com/FreeSWITCH_wire
>
> AIM: anthm
> MSN:anthony_minessale at hotmail.com
> GTALK/JABBER/PAYPAL:anthony.minessale at gmail.com
> IRC: irc.freenode.net #freeswitch
>
> FreeSWITCH Developer Conference
> sip:888 at conference.freeswitch.org
> googletalk:conf+888 at conference.freeswitch.org
> pstn:+19193869900
>
--
----------------------------------------------------------------------
: Kurt at pinboard.com http://www.pinboard.com/ business :
: http://kurt.www.pinboard.com/ private :
----------------------------------------------------------------------
: Unix and Internet Specialist :
: PGP fingerprint 5C88 FAD4 E111 F33E DA3E 2EC5 DF0E 220F FFD6 2FF1 :
----------------------------------------------------------------------
More information about the FreeSWITCH-users
mailing list