I'm starting to play around with the .NET speech recognition in System.Speech.Recognition. I've been able to get some very basic phrases recognized, but in the event handler, I'd like to get at the certain pieces of information as shown in the pizza ordering example.
I could parse values from e.Result.Text using regex, but the pizza ordering example made use of a really handy method called AppendChoices. The beauty of this method is that you essentially associate a list of possible words with a key, and when the event handler is called (after a phrase is recognized), you can access the value by looking at Semantics[<your key string here>]. However, while Semantics is still available, I don't know how to make use of it since it seems that AppendChoices has been deprecated.
Is my only recourse to use regex in the event handler to figure out what the spoken command arguments were?
Related
I'm trying to remove the automatic breaks added by the synthesis processor, to create speech files without any "linguistic pauses".
I'm using Microsoft's speech synthesis engine with the SpeechSynthesizer class in C#.
This is the output I get with "This is an example why do automatic breaks occur?" wrapped in <speak> tags with SpeechSynthesizer:
https://clyp.it/4nofhh3n
This is the output I want (achieved by using Oddcast's TTS Demo):
https://clyp.it/m55wt14u
I've read through w3.org's SSML documentation several times which in point 3.2.3 - break element, note the following:
If the element is not present between tokens, the synthesis processor is expected to automatically determine a break based on the linguistic context. In practice, the break element is most often used to override the typical automatic behavior of a synthesis processor.
This is how my voice is currently behaving. I want to somehow override/turn off this functionality, and have the speech be completely uninterrupted. I have tried putting a <break> element with attributes strength="none" and time="0ms" between the words where this automatic break occurs like they write above to override it, and all kinds of different things such as wrapping the whole text string in <s> tags etc, to no avail.
I also can't just remove the breaks in post processing, since the voice has a different tone on the words spoken, when the automatic breaks are added.
I have read through several different SSML documentations which, while often worded a bit differently compared to the w3 docs, don't explain how to concretely override the automatic breaks, which is my issue.
In my experimenting with SpeechSynthesizer if you put a break of 50ms at the end then it will respect it - if it's less then it'll be ignored.
However, it will always treat <speak> wrapped content as its own clause, so will speak it as if it's a sentence/clause, rather than carrying the prosody like the 2nd example. You need to send all your text in a single <speak> element (and voice) to have it treated as a single linguistic utterance.
I am trying to figure out a solution in C# to perform exception handling for multiple textboxes using windows forms.
The user can only enter one or two positive integers in these textboxes and if the user tries to enter more numbers or letters, a tooltip should appear with a warning message?
Thanks for the assistance!
For a case like this, I like to do what I call "passive Error Reporting". Rather then throwing exceptions, you take every value (usually strings). But display a message if it does not fit some criteria.
The simple approach is INotifyDataError. It allows you to show one error for each property (it is advantagenous to use a property or a string key in the backend storage).
I know there is a more complex brother that allows Multiple errors per Property/Key. But it is too long since I read it's name, so I can not remember it.
I have a NLP (natural language processing application) running that gives me a tree of the parsed sentence, the questions is then how should I proceed with that.
What is the time
\-SBAR - Suborginate clause
|-WHNP - Wh-noun phrase
| \-WP - Wh-pronoun
| \-What
\-S - Simple declarative clause
\-VP - Verb phrase
|-VBZ - Verb, 3rd person singular present
| \-is
\-NP - Noun phrase
|-DT - Determiner
| \-the
\-NN - Noun, singular or mass
\-time
the application has a build in javascript interpreter, and was trying to make the phrase in to a simple function such as
function getReply() {
return Resource.Time();
}
in basic terms, what = request = create function, is would be the returned object, and the time would reference the time, now it would be easy just to make a simple parser for that but then we also have what is the time now, or do you know what time it is. I need it to be able to be further developed based on the english language as the project will grow.
the source is C# .Net 4.5
thanks in advance.
As far as I can see, using dependency parse trees will be more helpful. Often, the number of ways a question is asked is limited (I mean statistically significant variations are limited ... there will probably be corner cases that people ordinarily do not use), and are expressed through words like who, what, when, where, why and how.
Dependency parsing will enable you to extract the nominal subject and the direct as well as indirect objects in a query. Typically, these will express the basic intent of the query. Consider the example of tow equivalent queries:
What is the time?
Do you know what the time is?
Their dependency parse structures are as follows:
root(ROOT-0, What-1)
cop(What-1, is-2)
det(time-4, the-3)
nsubj(What-1, time-4)
and
aux(know-3, Do-1)
nsubj(know-3, you-2)
root(ROOT-0, know-3)
dobj(is-7, what-4)
det(time-6, the-5)
nsubj(is-7, time-6)
ccomp(know-3, is-7)
Both are what-queries, and both contain "time" as a nominal subject. The latter also contains "you" as a nominal subject, but I think expressions like "do you know", "can you please tell me", etc. can be removed based on heuristics.
You will find the Stanford Parser helpful for this approach. They also have this online demo, if you want to see some more examples at work.
I have a c# program that lets me use my microphone and when I speak, it does commands and will talk back. For example, when I say "What's the weather tomorrow?" It will reply with tomorrows weather.
The only problem is, I have to type out every phrase I want to say and have it pre-recorded. So if I want to ask for the weather, I HAVE to say it like i coded it, no variations. I am wondering if there is code to change this?
I want to be able to say "Whats the weather for tomorrow", "whats tomorrows weather" or "can you tell me tomorrows weather" and it tell me the next days weather, but i don't want to have to type in each phrase into code. I seen something out there about e.Result.Alternates, is that what I need to use?
This cannot be done without involving linguistic resources. Let me explain what I mean by this.
As you may have noticed, your C# program only recognizes pre-recorded phrases and only if you say the exact same words. (As an aside node, this is quite an achievement in itself, because you can hardly say a sentence twice without altering it a bit. Small changes, that is, e.g. in sound frequency or lengths, might not be relevant to your colleagues, but they matter to your program).
Therefore, you need to incorporate a kind of linguistic resource in your program. In other words, make it "understand" facts about human language. Two suggestions with increasing complexity below. All apporaches assume that your tool is capable of tokenizing an audio input stream in a sensible way, i.e. extract words from it.
Pattern matching
To avoid hard-coding the sentences like
Tell me about the weather.
What's the weather tomorrow?
Weather report!
you can instead define a pattern that matches any of those sentences:
if a sentence contains "weather", then output a weather report
This can be further refined in manifold ways, e.g. :
if a sentence contains "weather" and "tomorrow", output tomorrow's forecast.
if a sentence contains "weather" and "Bristol", output a forecast for Bristol
This kind of knowledge must be put into your program explicitly, for instance in the form of a dictionary or lookup table.
Measuring Similarity
If you plan to spend more time on this, you could implement a means for finding the similarity between input sentences. There are many approaches to this as well, but a prominent one is a bag of words, represented as a vector.
In this model, each sentence is represented as a vector, each word in it present as a dimension of the vector. For example, the sentence "I hate green apples" could be represented as
I = 1
hate = 1
green = 1
apples = 1
red = 0
you = 0
Note that the words that do not occur in this particular sentence, but in other phrases the program is likely to encounter, also represent dimensions (for example the red = 0).
The big advantage of this approach is that the similarity of vectors can be easily computed, no matter how multi-dimensional they are. There are several techniques that estimate similarity, one of them is cosine similarity (see for example http://en.wikipedia.org/wiki/Cosine_similarity).
On a more general note, there are many other considerations to be made of course.
For example, some words might be utterly irrelevant to the message you want to convey, as in the following sentence:
I want you to output a weather report.
Here, at least "I", "you" "to" and "a" could be done away with without damaging the basic semantics of the sentence. Such words are called stop words and are discarded early in many tools that perform speech-to-text analysis.
Also note that we started out assuming that your program reliably identifies sound input. In reality, no tool is capable of infallibly identifying speech.
Humans tend to forget that sound actually exists without cues as to where word or sentence boundaries are. This makes so-called disambiguation of input a gargantuan task that is easily underestimated - and ambiguity one of the hardest problems of computational linguistics in general.
For that, the code won't be able to judge that! You need to split the command in text array! Such as
Tomorrow
Weather
What
This way, you will compare it with the text that is present in your computer! Lets say, with the command (what) with type (weather) and with the time (tomorrow).
It is better to read and understand each word, then guess it will work as Google! Google uses the same, they break down the string and compare it.
I'm currently working on a Program, that presses buttons for me. I'm working on WPF but I already finished my design in XAML, now I need some C# code.
I have a TextBox that should handle all the SendKeys input. I want to extend it's functionality by providing some CMD-like arguments. The problem is, I don't know how. ;A; For example:
W{hold:500}{wait:100}{ENTER}
This is a example line that I'd enter in the textbox. I need 2 new functions, hold and wait.
The hold function presses and holds the previous key for the specified time (500 ms) and then releases the button.
The wait function waits the specified time (100ms).
I know I could somehow manage to create this function but it would end up being not user editable. That's why I need these arguments.
You're trying to 'parse' the text in the text box. The simplest way is to read each character in the text, one by one, and look for '{'. Once found, everything after that up until the '}' is the command. You can then do the same for that extracted command, splitting it at the ':' to get the parameters to the command. Everything not within a '{}' is then a literal key you send.
There are far more sophisticated ways of writing parsers, but for what it sounds like you are doing, the above would be a good first step to get you familiar with processing text.