[Freeswitch-users] How to implement TTS barge-in using FS ESL
Christopher Rienzo
cmrienzo at gmail.com
Wed Nov 16 18:52:23 MSK 2011
Responses inline
Now it works in my ESL app though I am just able to do one dialogue ( I
> need to add the event catching for furthur dialgoues).
>
> I have a couple of questions here:
>
> 1. In the first try, my Nuance server was able to be accessed somehow
> (FS says the MRCP is not responding in 5000ms,
> something like that), then FS says: [WARNING] rtsp_client.c:386 ()
> Failed to Connect to RTSP Server 99.185.85.31:554,
> later FS says:
> [ERR] mod_unimrcp.c:1860 (TTS-6) SYNTHESIZER channel error!
> [ERR] switch_ivr_play_say.c:2439 Invalid TTS module!
>
> The SYNTHESIZER channel error and Invalid TTS module error are obvious.
>
> What I don't understand is why it went to this stange address:
> 99.185.85.31:554?
>
check your unimrcp configuration. Make sure the default TTS and ASR
profiles are set to actual servers.
> 2. I specified TTS engine in play_and_detect_speech as
> "say:unimrcp:nuance5-mrcp1-1: the text to speak"
> It works though I didn't specify the TTS voice.
>
> How do I specify the TTS voice? In the mrcp profile (how?)? or
> something like:
> "say:unimrcp:nuance5-mrcp1-1:Serena: the text to speak" (this
> seems not right.)
>
That won't work. Set the tts_engine variable as I explained previously, or
use say:unimrcp:voice:text to speak with the desired voice and the correct
default TTS profile defined in unimrcp.conf.xml. This is a limitation of
the say: notation. Alternatively, the voice can be defined with the
tts_voice channel variable.
> 3. The barge-in works well, thanks!. Is the barge-in configurable? In
> some scenarios, we might not allow barge-in.
>
If you don't want to barge in, just do "playback (or speak)" first, then
"play_and_detect_speech" with a silence prompt.
>
> 4. How could I get the text which has spoken to the user when barge-in
> occurs?
> Or Could I get the time when barge-in occurs? If I know the barge-in
> time and rough totale time for the whole text
> to be spoken I can figure out the spoken text by manually checking
> the recorded audio file later, which would be painful.
>
If this is necessary, you might want to use the lower-level functions
instead to watch for the begin-speaking event.
>
> 5. when I use "speak" and "detect_speech" apps in ESL, I can catch
> event: DETECTED_SPEECH and speech-type: begin-speaking
> and "detected-speech", then I do the recognition results processing.
>
> The new app play_and_detect_speech seems not generate these events any
> more. The way that I can think of to get the results
> is to catch event:CHANNEL_EXECUTE_COMPLETE then check if
> variable_current_application=play_and_detect_speech, then get
> the results from variable_detect_speech_result.
>
> Is this the proper way to get the results in ESL app? Or will
> play_and_detect_speech later on be consistent with detect_speech
> in term of ASR events?
>
play_and_detect_speech is a higher level abstraction to simplify things.
If you want to have more control, go back to using the ESL events. Reading
the code in mod_dptools and switch_ivr_async will give you hints about how
to do it correctly.
>
> 6. I'd like to set start-input-timers=false in the initial request then
> start the recognition timers (start-input-timers=true)
> after the TTS finishes.
> How possibly could I do this?
>
This is automatically done in the switch_ivr_play_and_detect_speech()
function. You just need to specify start-input-timers=false in the
beginning.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20111116/0bd5d893/attachment-0001.html
Join us at ClueCon 2011 Aug 9-11, 2011
More information about the FreeSWITCH-users
mailing list