[Freeswitch-users] Stopping TTS when play_and_detect_speech gets MRCP START-OF-INPUT

Sun Mar 21 06:31:07 UTC 2021

On Thu, Mar 18, 2021 at 4:46 PM mayamatakeshi <mayamatakeshi at gmail.com>
wrote:

>
>
> On Thu, Mar 18, 2021 at 3:59 PM mayamatakeshi <mayamatakeshi at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Mar 18, 2021 at 12:23 PM mayamatakeshi <mayamatakeshi at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 17, 2021 at 6:52 PM mayamatakeshi <mayamatakeshi at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I'm trying play_and_detect_speech with an ESL app but START-OF-INPUT
>>>> doesn't interrupt the TTS being played.
>>>> Is this expected?
>>>> (
>>>> https://freeswitch.org/confluence/display/FREESWITCH/mod_dptools%3A+play_and_detect_speech
>>>> is not clear about it)
>>>> Or maybe I need to set some channel variable for this to work.
>>>>
>>>
>>> Hi,
>>> My app is a fork of https://github.com/plivo/plivoframework
>>> I debugged FS code and found the cause of the problem:
>>> I verified that the DETECTED_SPEECH events (including the one
>>> with Speech-Type begin-speaking) were not being fired.
>>> This was because my app was not setting this variable:
>>>   https://freeswitch.org/confluence/display/FREESWITCH/fire_asr_events
>>> Then I set fire_asr_events=true
>>> but still the prompt didn't get interrupted by speech.
>>> Then I checked how FS code was fetching (dequeuing) events and I
>>> realized it checks for divert_events.
>>> Then I changed my app to send
>>>   divert_events off
>>> in the ESL socket
>>> and after that it worked and FS stopped the prompt upon speech start.
>>> However, it is not stopping the prompt when a DTMF digit is received (it
>>> was already this way before I did the changes).
>>> So things are better but still there is something amiss.
>>>
>>
>> OK. I found the reason:
>> I was using 'playback_terminators=any'
>> which reading the FS code shows will result in:
>>   terminators = "1234567890*#"
>> But I was sending DTMF 'a' so it was not causing the prompt to be
>> terminated.
>> Then I changed my test script to send '1' instead and then this caused
>> the prompt to terminate.
>> However, this also terminated the speech detection operation as FS sent
>> STOP to the MRCP server.
>> And the reason is because differently from begin-speaking, a terminator
>> will set the operation as done (switch_ivr_async.c):
>>
>>                       } else if (!strcasecmp(speech_type,
>> "begin-speaking")) {
>>
>> switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_INFO,
>> "(%s) START OF SPEECH\n", switch_channel_get_name(channel));
>>                           return SWITCH_STATUS_BREAK;
>>                       }
>>
>>                       if (terminators && strchr(terminators,
>> dtmf->digit)) {
>>
>> switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG,
>> "(%s) ACCEPT TERMINATOR %c\n", switch_channel_get_name(channel),
>> dtmf->digit);
>>                           switch_channel_set_variable_printf(channel,
>> SWITCH_PLAYBACK_TERMINATOR_USED, "%c", dtmf->digit);
>>                           state->result =
>> switch_core_session_sprintf(session, "DIGIT: %c", dtmf->digit);
>>                           state->done = PLAY_AND_DETECT_DONE;
>>                           return SWITCH_STATUS_BREAK;
>>                       }
>>
>> This might be OK for someone but in my case  there is another issue that
>> the termination of the speech detection is not notified to my app via ESL.
>>
>> I am hoping to solve this by leaving the DTMF collection to be done by
>> the MRCP server but currently the MRCP server I'm using (UniMRCP) doesn't
>> report START-OF-INPUT for DTMF (I only get the RECOGNITION-COMPLETE after
>> all digits are input). (my use case is to allow the user to speak some
>> number sequence or dial it).
>>
>
> Sorry,
> checking again, there is some specific condition that causes UniMRCP to
> not send the START-OF-INPUT.
> I'm not sure what it is yet but it may be by design or following something
> in the protocol that I am not aware of.
>

Just closing this issue and leaving some final details in case someone gets
into a similar problem:
  - I was testing with unimrcp server 1.6.0 and under some conditions this
version would not send START-OF-INPUT upon DTMF detection
  - after updating to unimrcp server 1.7.0, the problem went away.

And in case someone needs to follow SIP/MRCP/DTMF flows, you can try my
sngrep fork. Details here:
  https://github.com/irontec/sngrep/issues/78

(it also supports UTF-8 which is particularly relevant to see the text in
MRCP SPEAK, DEFINE-GRAMMAR and RECOGNITION-COMPLETE messages)

Here is a snapshot:
[image: sngrep.dtmf_and_mrcp.png]

>
>>
>> So I will switch to use detect_speech instead of play_and_detect_speech.
>>
>>
So, using play_and_detect_speech is viable for my needs as I can leave DTMF
handling to unimrcp server.
I might eventually need to implement a solution using detect_speech like in
case of max-number-of-digits which unimrcp server doesn't support at the
moment.
But for now, play_and_detect_speech should suffice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20210321/15ea52b6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sngrep.dtmf_and_mrcp.png
Type: image/png
Size: 129317 bytes
Desc: not available
URL: <http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20210321/15ea52b6/attachment-0001.png>