[Freeswitch-users] Fwd: strange behaviour with scheduled jobs, uuid_transfer hangs up one of the legs if freeswitch is not in the media path, lua freeswitch.Session(uuid) tries to call out if channel references by uuid does not exist any longer

Phillip Jones pjintheusa at gmail.com
Sat Aug 8 03:48:51 PDT 2009


Hi there,

Not sure whether this helps but test this without set bypass_media. In
my setup I have noticed the leg A session ends when bypass_media is
true. Call/bridge continue successfully.

Phillip Jones

On Thu, Aug 6, 2009 at 1:28 PM, Benedikt
Fraunhofer<fraunhofer.lists.freeswitch-001 at traced.net> wrote:
> Hello List, Hello *,
>
> First of all the usual excuses: sorry for the bad english and the long
> email, no native speaker and i really tried to make it shorter, but i
> guess this would result in even more "check back"s than it already
> does :)
>
> we're currently running in a weird "lockup"-scenario in our loadtests.
>
> Our setup is the following:
>
> three freeswitch servers, let's call them A(-leg), M(aster), and
> B(-leg) with the goal in mind to initiate calls on M which calls A,
> play some file, bridge to B, limit call length and play (different)
> prompts to A and B if they exceed that limit.
>
> (Note that A and B work fine, regardless of the amount of load we put on them)
> A and B are silly dialplan logic, accepting calls on a certain
> extension after a random delay and playing moh. Before calling
> playback to a localstream they call a lua script which schedules
> hangup somewhere in future (which works flawlessly)
>
> Calls are initiated on M using some hacked up loadgen-script issuing
> http requests like
>   originate [sofiaSyntaxToExtensionOn_A] 6000
> . The 6000 extension on M has the following (xml) dialplan which
> essentially does the following:
> ------
> answer()
> ...playback file...
> ...set some callerid stuff
> set bypass_media
> bridge to extension 6009 on B
> ------
> we use "execute_on_answer" on the b-leg to run a script which limits
> the length of the call (doesn't matter if it's done via "export
> nolocal" or "inlined" into the data part of the bridge application
> "{execute_on_answer=lua ...}")
>
> <action application="export" data="nolocal:execute_on_answer=lua
> lua/schedule-hangup.lua ${uuid}" />
>
> the lua script "schedula-hangup.lua" does essentially the following:
>
> ------
> api = freeswitch.API();
> local res = api:execute("sched_api", "+10 none lua
> lua/c2c-hangup-timeout.lua " .. argv[1]);
> ------
>
> the 10 seconds are just to speed up the time until it gets stuck.
>
> this is where things start to go wrong. if I comment out the call to
> the "schedule-hangup" script, everything works fine, even if it's
> under heavy load.
>
> c2c-hangup-timeout.lua does the following:
> ------------------
> local sess = argv[1];
> if(sess)
> then
>    freeswitch.consoleLog("INFO", "c2c-hangup-timeout.lua for uuid "
> .. sess .. "\n");
>
>    api = freeswitch.API();
>    local stillValid = api:execute("uuid_getvar", sess .. "
> Dummy-DoesChannelExists");
>    if(stillValid:sub(1,4) == "-ERR")
>    then
>        log("session uuid " .. sess .. " disappeared (nothing bad)");
>    else
>        -- this is important!!! Otherwise the aleg get's just hung up!
>        api:execute("uuid_media", sess);
>        api:execute("uuid_transfer", sess .. " -both timeout");
>    end
> else -- /if(sess)
>    log("called with nil session?");
> end -- /if(sess)
>
> ------------------
>
> i guess this needs some explanation:
> we get the uuid of the channel as argument in argv[1]. We don't use
>   local session = freeswitch.Session(uuid);
> since if the channel referenced by "uuid" does not exist any longer,
> freeswitch (or the lua bindings) try to interpret the uuid as an
> "originate string" and can't figure out how to call that. So we use a
> dummy api call to get some channel variable. If the channel does not
> exist any longer (A or B already hung up), we get an error message
> starting with "-ERR", otherwise the channel still exists (we get
> "_unset_" as the value, if it's not set) and we continue by getting
> freeswitch back in the media path (uuid_media) and then transferring
> both legs to an extension called "timeout" which plays some prompt and
> finally calls hangup().
>
> If we don't do the uuid_media call, one of the legs gets hung up when
> we transfer them to the extension. This looks like the following on
> the console after issuing "uuid_transfer [uuid] -both timeout"
> (extensions are not the same as in our loadgen example above)
>
>
> --------------
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1334 Hangup (*)
> sofia/internal/1000 [CS_HIBERNATE] [BLIND_TRANSFER]
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer
> sofia/internal/1004 at 192.168.179.177:5060 to XML[timeout at default]
> 2009-07-23 19:57:19.865703 [INFO] mod_dialplan_xml.c:310 Processing
> BFR1004->timeout in context default
> API CALL [uuid_transfer(73812082-77b1-11de-b9f8-a10bb0eb9f69 -both
> timeout)] output:
> +OK
>
> 2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer (**)
> sofia/internal/1000 to XML[timeout at default]
> 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1084 Session
> 60 (sofia/internal/1000) Ended
> 2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1086 Close
> Channel sofia/internal/1000 [CS_DESTROY]
> -----------
>
> note that it first does Hangup (denoted by *, no that's not an
> asterisk :) on extension 1000 and then tries to Transfer (**) the hung
> up channel to the dial plan. this could be the same as in an earlier
> post to the list "SIP re-invite / bypass_media // Phillip Jones //
> Wed, 01 Jul 2009 13:30:53 -0700)"
>
> This is why we do not directly call sched_transfer() but call a script
> in between to do the uuid_media() call. I couldn't figure out how to
> call that directly from the xml dialplan and/or how to check if the
> channel still exists.
>
> so... after using uuid_media(), both legs are transferred without an
> (intermediate|bogus) hangup() call.
>
> This only works fine if we've few concurrent calls. There is no magic
> borderline where it starts to refuse work.
>
> Some of the Symptoms are: traffic decreased to zero as no new channels
> are successfully brought up, some of the signaling traffic is not
> ACKed or OKed, scheduled jobs are not run.
>
> if i read the output of "show channels" correctly, they're all stuck
> in different applications like hangup(), some are calling lua but most
> of them are in signaling_bridge(). Freeswitch is still responding on
> the console and there's almost no load on the machine (no busy polling
> or some other kind of running amok).
>
> if i kill one of them using uuid_kill() or kill all of them
> using"fsctl hupall" i get "Task was executed late by 866 seconds 12379
> sched_api_function (none)" messages and the usual cleanup takes place.
> As a quick hack i tried to schedule a uuid_kill() call 20 seconds
> after the scheduling call to the lua script but that job is not
> executed either.
>
> So what am I doing wrong? Is it some deadlock where uuid_media() and
> uuid_transfer()  are waiting for the other to finish?
> Or some other silly simple thing i missed?
>
> Thx in advance
>
>  Benedikt.
>
> _______________________________________________
> FreeSWITCH-users mailing list
> FreeSWITCH-users at lists.freeswitch.org
> http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
> UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
> http://www.freeswitch.org
>




More information about the FreeSWITCH-users mailing list