[Freeswitch-users] FS and ASR engine
José Eduardo de C. Silva
educs13 at yahoo.com.br
Wed Jun 22 20:52:29 MSD 2011
the point with Dragon Speak is that it already has a language model buil-in (and an acoustic model as well). It's not just an ASR engine.
To solve your problem, you must have an ASR engine + statistical language model (LM) + acoustic model (AM). These three parts together can be called ASR system and it can be built directly from entreprises like Nuance and Loquendo. But if you want to use a free ASR system you'll have to combine a free engine + a free statistical LM + a free AM.
Two known free engines are Sphinx and Julius. I think that you can find some free english LMs and english AMs supported by these engines at http://www.voxforge.org/ and http://www.keithv.com/software/.
One problem that you'll face with these free engines is the lack of MRCP support. But it can be easily solved just by writing a simple module in FS (for example, see mod_pocketsphinx.c that comes with the source code of FS).
I hope it helps ...
--- Em qua, 22/6/11, Hector Geraldino <Hector.Geraldino at ip-soft.net> escreveu:
De: Hector Geraldino <Hector.Geraldino at ip-soft.net>
Assunto: [Freeswitch-users] FS and ASR engine
Para: "FreeSWITCH Users Help" <freeswitch-users at lists.freeswitch.org>
Data: Quarta-feira, 22 de Junho de 2011, 11:54
Hi everyone, I want to check with you guys to see if anyone has experience integrating FreeSwitch with an ASR engine, and using the engine as a merely transcriber of the conversation. I’ve been playing for the past two weeks or so with Nuance Speech Server/recognizer and pocketsphinx. Nuance is by far a better solution, but due to the lack of freely available documentation and my short expertise in this subject, I haven’t been to achieve my goal. The communication between FS and the ASR engine works great using MCRP, my concern is with the ASR engine itself. I want to allow the user to speak freely, and get a transcription of what the user said. I don’t want or need to understand what the meaning of the utterances are (definitely the engine doesn’t need to do that), also I don’t need/want to write any complex grammar or SLM to get an interpretation of the spoken phrases, I just want the plain text of what has been said. No decisions will
be taken based on what the user said, this information will just be passed to a 3rd application. I don’t know if this can be achieved or not without developing grammars (not suitable for open-ended dialogs) or training statistical language models. What I do recall is using Dragon Speak in MS word for dictation, without the need of doing some trtraining or developing grammars. That’s exactly what I’m pursuing: a simple plain text transcription of the spoken words. Have anyone of you deal with something like this by any chance? Thanks for your help. I apologize if this is not the right place to ask this type of questions. Thanks again,Hector
Join us at ClueCon 2011, Aug 9-11, Chicago
FreeSWITCH-users mailing list
FreeSWITCH-users at lists.freeswitch.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the FreeSWITCH-users