[Freeswitch-users] FS and ASR engine
Hector.Geraldino at ip-soft.net
Wed Jun 22 20:43:53 MSD 2011
Yeah, well, the thing is that it needs to be interactive (like an IVR system), so recording the voice and using a 3rd party service to translate a file is not an option for me right now. That's why I'm using MCRP to communicate FreeSwitch with the ASR engine.
All I need is to figure out how to get the recognized text from a given ASR, without constraining the user to talk in an specific way (by a grammar) and, if possible, without training SLMs. My guess is that it's not possible, as I can't find any resources on the web showing this feature, but I'm still optimistic hoping to find an 'easy' way of doing the transcription.
From: Pehr Anderson [mailto:pehr at harqen.com]
Sent: Wednesday, June 22, 2011 12:17 PM
To: FreeSWITCH Users Help; Hector Geraldino
Subject: Re: [Freeswitch-users] FS and ASR engine
You might check out http://Nexiwave.com
They have been active at Cluecon and have a web API
that does fully hosted ASR on WAV's or MP3's.
They are doing innovation around running their ASR in GPU clusters,
which means their internal cost of operation is likely to be the lowest in
Good ASR is always going to be computationally intensive,
so it is helpful to have somebody managing that for you.
It's going to be a lot easier that trying to juggle your own sphinx training sets.
On Wed, Jun 22, 2011 at 9:54 AM, Hector Geraldino <Hector.Geraldino at ip-soft.net<mailto:Hector.Geraldino at ip-soft.net>> wrote:
I want to check with you guys to see if anyone has experience integrating FreeSwitch with an ASR engine, and using the engine as a merely transcriber of the conversation.
I've been playing for the past two weeks or so with Nuance Speech Server/recognizer and pocketsphinx. Nuance is by far a better solution, but due to the lack of freely available documentation and my short expertise in this subject, I haven't been to achieve my goal.
The communication between FS and the ASR engine works great using MCRP, my concern is with the ASR engine itself. I want to allow the user to speak freely, and get a transcription of what the user said. I don't want or need to understand what the meaning of the utterances are (definitely the engine doesn't need to do that), also I don't need/want to write any complex grammar or SLM to get an interpretation of the spoken phrases, I just want the plain text of what has been said. No decisions will be taken based on what the user said, this information will just be passed to a 3rd application.
I don't know if this can be achieved or not without developing grammars (not suitable for open-ended dialogs) or training statistical language models. What I do recall is using Dragon Speak in MS word for dictation, without the need of doing some trtraining or developing grammars. That's exactly what I'm pursuing: a simple plain text transcription of the spoken words.
Have anyone of you deal with something like this by any chance?
Thanks for your help. I apologize if this is not the right place to ask this type of questions.
Join us at ClueCon 2011, Aug 9-11, Chicago
FreeSWITCH-users mailing list
FreeSWITCH-users at lists.freeswitch.org<mailto:FreeSWITCH-users at lists.freeswitch.org>
Pehr Anderson, VP Platform Technology
HarQen - http://HarQen.com
pehr at harqen.com<mailto:pehr at harqen.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the FreeSWITCH-users