[Freeswitch-users] Getting Started with Speech Recognition

Tue Jul 26 01:42:17 MSD 2011

I've been attempting to experiment with this part of Freeswitch, but it seems that this is an area that hasn't yet received a lot of attention with regard to docs and samples. Perhaps a some of you that are experienced in this area wouldn't mind clearing up some of my confusion?

First, I have FS working fine (outside of ASR). Pocketsphinx is built and loaded. As for me, I have some experience with VXML, and generally know how a grammar works, but the last phone ASR projects that I was involved with were several years back on Tellme Studio.

First, as to grammars, I'm not sure, but it seems as though FS has used different formats at different times. The mod_pocketsphinx page says that JSGF is the current format. Can the JSGF grammar files be loaded without being compiled to another format? I ask that, as some of the examples, such as the LUA Directory example, talk about compiling grammars, and the format shown there doesn't appear exactly the same as what I see from the JSGF examples. Is the info about compiling grammars obsolete?

Next, I'm not clear about the use of the detect_speech command. The wiki page lists some example forms of the command, but doesn't explain the purpose of the different forms, nor their arguments.

Here are some examples:

detect_speech <mod_name> <gram_name> <gram_path> [<addr>]

>From the examples, I see that "pocket sphinx" can be used for the module. I think that gram_path is the path to the grammar file, without the ".gram" suffix, and uses the grammar folder as a base. However, what is addr? An address? To, or for, what?

detect_speech grammaron <gram_name>

I think that

detect_speech grammaron <gram_name>

Might be used when a large .gram file has been loaded, and the code needs to enable a grammar for a specific context. However, I don't see a command for just loading a .gram file without also specifying a grammar name. If I use this grammaron command, will it search all files in the grammar folder for grammars that match the name, or is this just for enabling a grammar that I've already loaded through other means?

Maybe grammaron is used to activate a grammar that was disabled with

detect_speech grammaroff <gram_name>

But, if so, then what is the purpose of

detect_speech nogrammar <gram_name>

Does that also disable a grammar? Maybe it deletes the grammar from memory, also?

For:

detect_speech param <name> <value>

What are the possible parameter names and acceptable values?

detect_speech start_input_timers

What are the purposes of the start_input_timers? Does this cause ASR to time out if no input is recognized? Does it cause ASR to pause for a time before beginning to recognize speech?

I've tried to write up a simple grammar with only a few terms, and have a LUA IVR script attempt to detect speech using this grammar. My input callback is never called with any sort of speech event when I speak, though, and I don't receive any FS console errors. I must be making some incorrect assumptions about how the commands are used, or how the grammar is formatted.

Perhaps my tests aren't working because there is a problem with Pocketsphinx. I can't try the Pizza demo to verify that Pocketsphinx is working correctly, as I'm unable to download its grammar while bkw.org is offline, and no other sources for the grammar come up on a Google search.

Finally, I'd like to know if there are any compelling reasons to use one scripting language over another for speech-driven IVRs. I have experience with Lua, Javascript, and PHP, so could use any of them. The Freeswitch book uses Lua a lot, so that's what I've been trying so far. Perhaps I'd be better served with Javascript?

Thanks for any pointers!

Bryan