[Freeswitch-users] FS and ASR engine

Wed Jun 22 20:52:29 MSD 2011

Hi Hector,

the point with Dragon Speak is that it already has a language model buil-in (and an acoustic model as well). It's not just an ASR engine. 

To solve your problem, you must have an ASR engine + statistical language model (LM) + acoustic model (AM). These three parts together can be called ASR system and it can be built directly from entreprises like Nuance and Loquendo. But if you want to use a free ASR system you'll have to combine a free engine + a free statistical LM + a free AM.

Two known free engines are Sphinx and Julius. I think that you can find some free english LMs and english AMs supported by these engines at http://www.voxforge.org/ and http://www.keithv.com/software/.

One problem that you'll face with these free engines is the lack of MRCP support. But it can be easily solved just by writing a simple module in FS (for example, see mod_pocketsphinx.c that comes with the source code of FS).

I hope it helps ...
Eduardo

--- Em qua, 22/6/11, Hector Geraldino <Hector.Geraldino at ip-soft.net> escreveu:

De: Hector Geraldino <Hector.Geraldino at ip-soft.net>
Assunto: [Freeswitch-users] FS and ASR engine
Para: "FreeSWITCH Users Help" <freeswitch-users at lists.freeswitch.org>
Data: Quarta-feira, 22 de Junho de 2011, 11:54

Hi everyone,  I want to check with you guys to see if anyone has experience integrating FreeSwitch with an ASR engine, and using the engine as a merely transcriber of the conversation.   I’ve been playing for the past two weeks or so with Nuance Speech Server/recognizer and pocketsphinx. Nuance is by far a better solution, but due to the lack of freely available documentation and my short expertise in this subject, I haven’t been to achieve my goal.  The communication between FS and the ASR engine works great using MCRP, my concern is with the ASR engine itself. I want to allow the user to speak freely, and get a transcription of what the user said. I don’t want or need to understand what the meaning of the utterances are (definitely the engine doesn’t need to do that), also I don’t need/want to write any complex grammar or SLM to get an interpretation of the spoken phrases, I just want the plain text of what has been said. No decisions will
 be taken based on what the user said, this information will just be passed to a 3rd application.  I don’t know if this can be achieved or not without developing grammars (not suitable for open-ended dialogs) or training statistical language models. What I do recall is using Dragon Speak in MS word for dictation, without the need of doing some trtraining or developing grammars. That’s exactly what I’m pursuing: a simple plain text transcription of the spoken words.  Have anyone of you deal with something like this by any chance?  Thanks for your help. I apologize if this is not the right place to ask this type of questions.  Thanks again,Hector
-----Anexo incorporado-----

_______________________________________________
Join us at ClueCon 2011, Aug 9-11, Chicago
http://www.cluecon.com 877-7-4ACLUE

FreeSWITCH-users mailing list
FreeSWITCH-users at lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freeswitch.org/pipermail/freeswitch-users/attachments/20110622/77bd307d/attachment.html