I'm making my final year project i.e. speech recognition. but I don't have any idea how to start. I will use c#. Please can anyone guide me how to start? what should be the first step?
Thanks
You probably want to start with the wikipedia entry on speech recognition here: http://en.wikipedia.org/wiki/Speech_recognition - at the end of that article, there are a bunch of useful links to papers and software on the topic.
Another thing you will want to do is talk to the professor who is coordinating this project. He or she will know about other resources and can probably point you in a good direction.
Also - whenever embarking on a project you know nothing about, google is your friend
Speech recognition is really fuzzy pattern-matching, so how about looking into artificial neural networks as they're extremely good at pattern matching. Ensure that the audio's in a nice simple format and trim to syllables/words. Train the network on these files and then find a way to split the files you record in code. It may be simplest to start with a very limited vocabulary (individual letters maybe) as a proof of concept. Be prepared to run computers overnight to train the networks and try to get access to a high performance cluster.
I would start by researching some libraries and reading up on these subjects..
http://www.microsoft.com/speech/evaluation/thirdparty/engines.mspx
http://www.codeproject.com/KB/audio-video/TTSinVBpackage.aspx
http://blogs.msdn.com/coding4fun/archive/2006/10/31/909044.aspx
http://www.c-sharpcorner.com/UploadFile/ssrinivas/SpeeechRecognitionusingCSharp11222005054918AM/SpeeechRecognitionusingCSharp.aspx
You can look at the .Net System.Speech.Recognition namespace:
http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspx
Take a look at this MSDN article which describes the Speech libraries built into Windows Vista and Windows 7: http://msdn.microsoft.com/en-us/magazine/cc163663.aspx
Related
I am creating an android application to recognize book in a library. What I do is I will take a image of the book spine of a book and send it to a server to do the image process there and recognize the book from a database and send the details about the book to the phone or if book is not there, it will recognize the optical characters and send it to the mobile application. I am hoping to do the image processing process using C#. The book recognition is done using a template image comparing which are in the database with the sent image. So I need some help figuring out what would be the best approach to do this. I have already researched on some methods such as
Template matching
Pattern recognition
feature recognition
I want to know when it comes to images like books what would be the recommended method which I better follow. And Is there any good APIs for this. I have researched on OpenCV but want to know if there are better APIs. And how can I use OCR when recognizing the book. I want application to be fast. Normally when we compare two book spines(template and image) if i get 60% of similarities I can assume its the same book. So I am searching for the optimal way...! Help me out with this...!
While I have limited knowledge in the field of image processing, there is a library which offers such facilities: AForge.NET. That might be good as a starting reference.
EDIT: for an introductory explanation of the theory behind image processing, this may also offer some guidance: http://www.societyofrobots.com/programming_computer_vision_tutorial.shtml
I understand that you are looking for some API or "already-built" image processing library to help you with this, but this answer might help you in a way, or other people who want to pursue something like this.
There are some pretty helpful research papers (including tests from successful implementations) on this Mobile Visual Search page at Stanford. Check out the heading "Book Spine Recognition for Asset Tracking" on that page.
i am doing a project wherein i have to extract nouns adjectives noun phrases and verbs from text files(.doc) format.
i have a corpus of around 75 such files. i have accessed net to find about it and i came across POS tagging in python using nltk.
as my project is in c# (using visual studio 2008) i need a code to do so.
i have tried wordnet api for the same and even sharpnlp but as i am a newbie i found these tough to integrate with my project.
can anybody please suggest me simpler code to do so using something like vocabulary etc. plz help me guys.
thanx.
I worked in NLP (Natural Language Processing) for an industry leader for a while and what you want to do is no trivial task. I know one of the creators of nltk and I have used it myself; it's a high quality open source tool and I'd recommend you use it (do you have a particularly compelling reason to use C#?)
POS tagging is typically implemented by training a model of language on hand-annotated data, then applying that model to new text, predicting the parts of speech and giving a confidence . nltk has tools that do this, and they also have some models (if I'm not mistaken).
You'll find that most tools are written in C++, Java, and Python. If you don't know any of the languages look at this as an excellent opportunity to learn something!
See Wikipedia, especially the links at the bottom, for more information and other software available to use for such tagging.
Christopher is correct in his statement that NLP implementations are no picnic. However, I've recently looked into a viable solution using OpenNLP in a .NET project with a rudimentary PoS parser. In my example I am looking for noun phrases, but it shouldn't be too difficult a text to find other fragments as well. I find the OpenNLP Tools Models for 1.5 to be sufficient for my purposes.
I realize this answer is woefully late for the questioner, but hopefully it will give others some inspiration with this difficult field to get into.
Extracting noun phrases with contextual relevance in .NET using OpenNLP
Kindly read through this article.
Easy Integration of SharpNLP with C# Visual Studio Project
In this article, I have given a step by step way of integrating SharpNLP with C# project and have given sample code snippets for specifically address your issue such as Sentence Splitting, tokenizing and POSTagging.
Try this out and I will be able to help you with the problems you encounter.
Hi all
I need your help in studying object recognition in video as this will be my new project in my faculty.
I had a previous study in "Computer Vision" field !
I just need your suggestions as " Good Books, Web resources, Tutorials, others " that will help me in my project.
my project will be in c# or Matlab
thanks
Just a simple suggestion here. Break the problem down into small manageable chunks. Since your studying object recognition you will probably want to get most software off the shelf so you can focus on studying and not debugging.
If you are using mat lab. maybe you should look at this.
http://www.mathworks.com/products/image/
c# I would assume to have some awesome image processing library's now thanks to the xbox kinect.
http://www.codeproject.com/Articles/148251/How-to-Successfully-Install-Kinect-on-Windows-Open.aspx
and just another technology that is good for image processing is labview. If your faculty has licences and people that know it well to help you it may be another option.
http://www.ni.com/labview/whatis/?nipkw=LabVIEW&nicam=OceaniaZA-VI2009&nigrp=labview&nisrc=Google&niurl=&ninet=search
It's not a simple subject so finding specific books/resources are unlikley. As Oli said, your tutor is a specialist, they should be able to point you to guides/reference material
Check out scientific journals in the area of image processing such as Pattern Recognition, Pattern Recognition Letters, Computer Vision and Image Understanding, International Journal of Computer Vision etc, and books such as Computer Vision And Image Processing.
Is it possible to have an application built using the .NET speech recognition classes and pass in a WAV file for it to go through and create a text representation of it. For example, this what I'm trying to do:
We have a QA department at my office and they have to listen to hundreds of calls a day which is quite impossible, and there's not enough people listening to everything to keep up. What I want to do is have the audio file uploaded to our server and have the server parse it and create a transcript of it. It doesn't matter if it's not perfect, but just a base which would be easier to skim through a couple of dozen lines of text than listen to a 2 hour recording.
Based on a saved transcript I can implement full-text search in the database and also run checks against the transcript if someone is saying something that's a misrepresentation.
So, is it possible to create an application using the .NET speech recognition classes and just pass the WAV file to it and it spit out a rough transcript?
I've dug around MSDN on the Speech classes briefly while thinking up the idea, so I don't have that much knowledge if it's possible to be done.
If possible, I would appreciate any examples in C#. Topic 1055347 is similar to the question I'm having, and was provided links, the most specific of which is in C++. I'm not a C++ developer, nor have I ever went to school for programming, I'm all self though C#, so I would like to stay in the language that I know.
Thanks in advance!
This sounds like you've got a call center type of application. Microsoft Speech Server has a SR engine optimized for telephony (8000 Hz sample rate), which will generate much better recognitions than the desktop SR engine. However, the engine isn't really designed for transcription (although it can do it), and the transcriptions definitely need to be reviewed before further processing occurs. Microsoft Exchange Unified Communications uses the SR engine to generate transcripts of voice mail, and while it's better than nothing, it often generates amusing nonsense.
With areas like speech recognition you are likely to either find a stand alone EXE or an API in c/c++.
For the links in the other topic, you can use a tool like P Interop Assistant to generate C# code. The C# code acts like a wrapper around the unmanaged dll, so you can call it from c#.
This is likely to be the best way to get the functionality you are looking for.
Yes.
I did such an application a few years ago on the Tablet PC; you can read about it at http://web.archive.org/web/20060615192119/www.devx.com/TabletPC/Article/30761 (At the time, I spoke of using Interop to access the libraries, but I believe that the programming model has remained the same, just with a managed wrapper.)
At the time, the results were very poor, but maybe for your use-case better than nothing.
How about route the calls to Google Voice? I'm sure there are similar services. I have been amazed at its accuracy so far, plus you can click and listen to it if required. Google Voice will forward voice calls to SMS or email.
UPDATE: On reread, maybe since you are recording calls it won't work as I yous the voice message left.
<flavor> I want to create a spelling test program for my grade schoolers that would let them enter and record their spelling words then test them on them through out the week.</flavor>
What's a good Delphi API with which I could select a recording device, capture and save sound files, then play them back?
I'm also toying with doing the same project in C#, so C# Sound capture/playback API recommendations would also be appreciated.
An alternative to recording would be to use the MS Speech API with C#, enter the words via keyboard, and have it state what was keyed in.
Just a thought... Good luck on your app -- it sounds like a really cool program!
I've found New Audio Components to be quite good for Delphi.
This component set looks promising though I've never used it myself. AudioLab 3.1 has both VCL components as well .NET 2.0 components which should allow you to use it whether you stay with developing your application in Delphi or move to C#. Finally, it appears to be Free for non-commercial use.
The best place to look for Delphi Components
(Audio)
http://www.torry.net/pages.php?id=167
Why not use the TMediaPlayer that comes with Delphi (in the System Tab of the Palette)?
It can record and play wave files very easily
I was also going to suggest AudioLab.