[Freeswitch-users] How to implement TTS barge-in using FS ESL

Wed Nov 16 18:52:23 MSK 2011

Responses inline

Now it works in my ESL app though I am just able to do one dialogue ( I
> need to add the event catching for furthur dialgoues).
>
> I have a couple of questions here:
>
>   1. In the first try, my Nuance server was able to be accessed somehow
> (FS says the MRCP is not responding in 5000ms,
>     something like that), then FS says: [WARNING] rtsp_client.c:386 ()
> Failed to Connect to RTSP Server 99.185.85.31:554,
>     later FS says:
>      [ERR] mod_unimrcp.c:1860 (TTS-6) SYNTHESIZER channel error!
>      [ERR] switch_ivr_play_say.c:2439 Invalid TTS module!
>
>    The SYNTHESIZER channel error and Invalid TTS module error are obvious.
>
>     What I don't understand is why it went to this stange address:
> 99.185.85.31:554?
>

check your unimrcp configuration.  Make sure the default TTS and ASR
profiles are set to actual servers.

>   2. I specified TTS engine in play_and_detect_speech as
>          "say:unimrcp:nuance5-mrcp1-1: the text to speak"
>      It works though I didn't specify the TTS voice.
>
>      How do I specify the TTS voice? In the mrcp profile (how?)? or
> something like:
>          "say:unimrcp:nuance5-mrcp1-1:Serena: the text to speak" (this
> seems not right.)
>

That won't work.  Set the tts_engine variable as I explained previously, or
use say:unimrcp:voice:text to speak with the desired voice and the correct
default TTS profile defined in unimrcp.conf.xml.  This is a limitation of
the say: notation.  Alternatively, the voice can be defined with the
tts_voice channel variable.

>   3. The barge-in works well, thanks!. Is the barge-in configurable? In
> some scenarios, we might not allow barge-in.
>

If you don't want to barge in, just do "playback (or speak)" first, then
"play_and_detect_speech" with a silence prompt.

>
>   4. How could I get the text which has spoken to the user when barge-in
> occurs?
>      Or Could I get the time when barge-in occurs? If I know the barge-in
> time and rough totale time for the whole text
>      to be spoken I can figure out the spoken text by manually checking
> the recorded audio file later, which would be painful.
>

If this is necessary, you might want to use the lower-level functions
instead to watch for the begin-speaking event.

>
>   5. when I use "speak" and "detect_speech" apps in ESL, I can catch
> event: DETECTED_SPEECH and speech-type: begin-speaking
>      and "detected-speech", then I do the recognition results processing.
>
>     The new app play_and_detect_speech seems not generate these events any
> more. The way that I can think of to get the results
>     is to catch event:CHANNEL_EXECUTE_COMPLETE then check if
> variable_current_application=play_and_detect_speech, then get
>     the results from variable_detect_speech_result.
>
>     Is this the proper way to get the results in ESL app? Or will
> play_and_detect_speech later on be consistent with detect_speech
>     in term of ASR events?
>

play_and_detect_speech is a higher level abstraction to simplify things.
If you want to have more control, go back to using the ESL events.  Reading
the code in mod_dptools and switch_ivr_async will give you hints about how
to do it correctly.

>
>   6. I'd like to set start-input-timers=false in the initial request then
> start the recognition timers (start-input-timers=true)
>      after the TTS finishes.
>      How possibly could I do this?
>

This is automatically done in the  switch_ivr_play_and_detect_speech()
function.  You just need to specify start-input-timers=false in the
beginning.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20111116/0bd5d893/attachment-0001.html