Phonetic characters to speech - c#

My purpose is that to be able to let my application to talk in less popular language (for example Hokkien, Malay, etc). My current approach is using recorded mp3.
I want to know whether there is 'phonetic characters to speech' engine exists for .net or any platform?
Phonetic characters here just like the phonetic entry in paper dictionary. Any idea?

What you need is a Large Vocabulary TTS Engine. Microsoft has a speech SDK that allows you to say as you type among other things, and also the Windows SAPI (Speech API - not sure if the SDK and API are the same things). I know that they do have male and female voices for English, but maybe not for other languages such as Malay (where there may not have been much of a market as yet). You might want to take a look at Festival Project at CMU. They usually have a lot of voices in different languages, but some of the less known ones may not be as well developed as the ones for English.
Further update:
Check the MBROLA site out. It is an open-source project for developing multi-lingual Large vocab TTS engines and they also have a malay extension. I do not know how good it is though. I tried out the Hindi one and feel that there is a lot of work that still needs to be done.
Also, check out the BabelFish site. They have links to a lot of free TTS engines that should have some support for Malay.
Update 3: I do not know if this will suit your purpose, but if the text that the application must speak out is low, then you can try concatenative speech synthesis over a limited vocabulary too. Record fragments of sentences in Malay (or any other language) and pass the output of your program to your own limited vocab tts engine where you create the output. One example could be (in English): " was the most valuable player." Here, "was the most valuable player" becomes one fragment while the "Player X" can be changed at will. This, if it serves your purpose, should work well.

Have you looked at the System.Speech namespaces?
In particular the System.Speech.Synthesis and System.Speech.Synthesis.TtsEngine namespaces.

Here is the VB.NET code:
'create the object. This object will store your phonetic 'characters'
Dim PBuilder As New System.Speech.Synthesis.PromptBuilder
'add your phonetic 'characters' here. Just ignore the first parameter.
'The second parameter is your phonetic 'characters'
PBuilder.AppendTextWithPronunciation("test", "riːdɪŋ")
'now create a speaker to speak your phonetic 'characters'
Dim SpeechSynthesizer2 As New System.Speech.Synthesis.SpeechSynthesizer
'now actually speaking. It will speak 'reading'
SpeechSynthesizer2.Speak(PBuilder)
And here is the converted C# code:
//create the object. This object will store your phonetic 'characters'
System.Speech.Synthesis.PromptBuilder PBuilder = new System.Speech.Synthesis.PromptBuilder();
//add your phonetic 'characters' here. Just ignore the first parameter.
//The second parameter is your phonetic 'characters'
PBuilder.AppendTextWithPronunciation("test", "riːdɪŋ");
//now create a speaker to speak your phonetic 'characters'
System.Speech.Synthesis.SpeechSynthesizer SpeechSynthesizer2 = new System.Speech.Synthesis.SpeechSynthesizer();
//now actually speaking. It will speak 'reading'
SpeechSynthesizer2.Speak(PBuilder);

The .Net System.Speech.Synthesis.PromptBuilder class will create audio from SSML strings. You can use these to construct sounds from raw phonemes and sampled audio. The audio is not language-dependent.

Maybe this? System.Speech.Recognition.SrgsGrammar.SrgsPhoneticAlphabet

I have tried the System.Speech.Synthesis.PromptBuilder. And I have to say that current implementation of phonetic characters are very elementary and not accurate. For example, the PromptBuilder lacks the speech intonation, and lack of stress emphasis in a word. PromptBuilder only able to output monotone and robotic sound which is very annoying.
My recommendation is that to keep using your current approach. Using mp3 to deliver message is more natural and cost effective in terms of time required to translate perfect phonetic characters of your speech.

Related

How to convert a set of arabic numbers (order numbers) to speech in c#.net?

To make my question clear, I don't want to use System.Speech.Synthesis library by Microsoft, since it does not support Arabic at all ..
and I tried looking for other TTS engines but couldn't find anything helpful ..
so I figured there should be another way without using TTS .. like playing a set of audio files corresponding with my numbers.
in short, its a call system that calls for numbers in a queue .. can any one with enough experience in this area show me a good start to go on ? or if there are good libraries out there that could be used with the .net framework that already does my thing?

Add words to DictationGrammar in C# Speech Recognition?

This is something that has bugged me for a while. I'm developing a C# application that uses voice control, and I was wondering if there is any possible way to load a DictationGrammar and then add words to it to improve accuracy. For instance: I'm trying to use it to search google and bing. It barely recognizes the word 'google' and has never recognized the word 'bing'. Is there any way to do this? DictationGrammar is not very accurate at all (it keeps adding words). Currently, I'm loading the grammar like this:
PACSREC.LoadGrammarAsync(new DictationGrammar());
You can add words to the user lexicon, and the DictationGrammar will implicitly add those words.
Unfortunately, the Lexicon APIs aren't exposed via the System.Speech.Recognition APIs; instead, you'll have to use the SpeechLib (automation-compatible) APIs to do so. See this question for examples.

Simple Grammar for Speech Recognition

I have a program with GrammarBuilders and Grammer that is used in a SpeechRecognitionEngine to recognize speech. Can I, rather than recognizing from audio, use the same grammar to recognize a typed command (in a string)? Something like commandGrammar.parse(commandString)?
You should be able to use SpeechRecognitionEngine.EmulateRecognize which takes a text input in place of audio for speech recognitions.
I am not sure of the intended use, but if this will be used for something like a chat bot that automatically interacts with text input via IM or SMS I think you will find grammars very cumbersome to maintain and restrictive. I would recommend something like Artificial Intelligence Markup Language (AIML) for handling text responses. It is easy to learn and very powerful. Instead of using concise grammars which ASR's require, this language allows you to use wildcards which are much more conducive to text input. There are even some C# open source projects that provide libraries to work with AIML and simplify creation of chat bots.

Detect the language of a text is english in PDF or DOC files

Requirement is that i want to identify that the text written in PDF or Doc is english or non english. if i got a single word of (turiskh, french,arabic and etc.) have to avoid the whole documnet
its urgent plz give me sample code for this functionality
Have a look on Google Translate API, only free service who could do this for you what I know. Otherwise I can only see the solution of having your own dictionary etc.. But thats a different story
I guess you could use LangId. However there are some restrictions:
To use our API in live websites or services we suggest you to apply for a free API key, using the below form. The API key expands your developing possibilities allowing you to do up till 1,000 requests per hour (~720,000 per month).
I don't think this will solve your 'single word' issue however. I believe if the text has 6 words English and 4 words in another language it will see the text as English since that language is mainly used in the file. I haven't looked at the API myself though so there might be some solutions for that.
Hope it is of use to you.
Maybe the detect function of Google's Translate API could help you:
http://code.google.com/apis/language/translate/v2/getting_started.html#language_detect
This is not possible for single words.
Is "the" an English word? Well, yes, but it's also a Danish word (meaning tea). Does the word Schadenfreude indicate a non-english text? Not necessarily, it all depends on the context.
Adding to the list of APIs that support language determination, Bing API has a call that will determine the language for an array of strings.
http://msdn.microsoft.com/en-us/library/ff512412.aspx
Hope this helps somewhat.

Question on Speech Recognition classes in .NET

Is it possible to have an application built using the .NET speech recognition classes and pass in a WAV file for it to go through and create a text representation of it. For example, this what I'm trying to do:
We have a QA department at my office and they have to listen to hundreds of calls a day which is quite impossible, and there's not enough people listening to everything to keep up. What I want to do is have the audio file uploaded to our server and have the server parse it and create a transcript of it. It doesn't matter if it's not perfect, but just a base which would be easier to skim through a couple of dozen lines of text than listen to a 2 hour recording.
Based on a saved transcript I can implement full-text search in the database and also run checks against the transcript if someone is saying something that's a misrepresentation.
So, is it possible to create an application using the .NET speech recognition classes and just pass the WAV file to it and it spit out a rough transcript?
I've dug around MSDN on the Speech classes briefly while thinking up the idea, so I don't have that much knowledge if it's possible to be done.
If possible, I would appreciate any examples in C#. Topic 1055347 is similar to the question I'm having, and was provided links, the most specific of which is in C++. I'm not a C++ developer, nor have I ever went to school for programming, I'm all self though C#, so I would like to stay in the language that I know.
Thanks in advance!
This sounds like you've got a call center type of application. Microsoft Speech Server has a SR engine optimized for telephony (8000 Hz sample rate), which will generate much better recognitions than the desktop SR engine. However, the engine isn't really designed for transcription (although it can do it), and the transcriptions definitely need to be reviewed before further processing occurs. Microsoft Exchange Unified Communications uses the SR engine to generate transcripts of voice mail, and while it's better than nothing, it often generates amusing nonsense.
With areas like speech recognition you are likely to either find a stand alone EXE or an API in c/c++.
For the links in the other topic, you can use a tool like P Interop Assistant to generate C# code. The C# code acts like a wrapper around the unmanaged dll, so you can call it from c#.
This is likely to be the best way to get the functionality you are looking for.
Yes.
I did such an application a few years ago on the Tablet PC; you can read about it at http://web.archive.org/web/20060615192119/www.devx.com/TabletPC/Article/30761 (At the time, I spoke of using Interop to access the libraries, but I believe that the programming model has remained the same, just with a managed wrapper.)
At the time, the results were very poor, but maybe for your use-case better than nothing.
How about route the calls to Google Voice? I'm sure there are similar services. I have been amazed at its accuracy so far, plus you can click and listen to it if required. Google Voice will forward voice calls to SMS or email.
UPDATE: On reread, maybe since you are recording calls it won't work as I yous the voice message left.

Categories