I've seen speech recognition from input devices (obviously) and I've seen speech recognition from files (http://gotspeech.net/forums/thread/6835.aspx). However, I was wondering whether it would be possible to run speech recognition on system audio in real time. By system audio, the sound that comes out of your speakers.
It would be a great tool for those who are hard of hearing, as they are watching YouTube videos, the C# Application could transcribe what's being said.
How could I go about doing this?
Very easily - Go to the sound mixer, choose input and enable/unmute "Stereo Mix". You should, of course, mute the mic if you don't want to record that too. Then, just start recording the same way you'd record the mic - now you'll get the same feed as the speakers at digital quality.
This can be done programatically although it can be fiddly - especially if you want to support WinXP as well as Vista/Win7 (Sound was overhauled in Vista and I believe the APIs are significantly different although I haven't had to use them yet).
You're almost certainly going to need to filter the sound before attempting recognition. Unless the speech recog. library you're using is designed to work in adverse conditions, music and special effects will interfere with proper recognition as will multiple people speaking at the same time.
If you haven't got a super-robust library, filters to attenuate non-vocal frequencies are going to be a must. You may also need to apply volume normalisation to account for loud/quiet scenes - There are hundreds of filters that could potentially improve matching.
You may want to access the recognition API at the lowest level to get as much control as possible - You'll need to tweak it to cope with people shouting, breathless, crying, etc... If you start designing for flexible low-level access, it will probably save you weeks if you find you need it later on and have to re-architect.
I'd suggest you look into NAudio as a starting point for audio processing
I suspect you'll be able to get something which works under ideal conditions without too much effort - but tweaking it to work well in all eventualities may be a mammoth task. That said, it sounds like a fun project.
You could improve recognition chance considerably by creating genre-, user- or show-specific dictionaries. These could either be pre-generated, or built automatically using a weighted feedback loop - perhaps also allowing the user to correct mistakes.
Related
I have a professional sound card, and I want to record the signals from the guitar with c++ or c# for developing guitar effects in real time.
How can i record in real time through a c++ method ?
Is it mean that I need the sound card API ?
this one is enough?
Although may not be as easy as using a pre-built library, you may be able to get a C++ SDK for your sound card from the manufacturer. I would start by browsing their site or contacting support.
If that isn't an option, you can also use DirectSound which is part of the DirectX family of products. The learning curve is fairly steep but I believe it should do just about anything you want.
One final option is to look at a favorite tool (such as sound forge). A number of these tools support automation which means you can click through the app, decide what you want, then automate that sequence of events (See this as an example).
Hope that helps, best of luck!
Side Note: I have developed a number of hardware interfaces and in my experience its best to start with an example that does at least something like what you are looking for, then modify the code from there. If any particular option doesn't have an example like this I would probably skip it in favor of an example that does.
Examples
Direct Sound - Microsoft has a learning site for direct sound which you can find here. I also found this blog article which has an example for recording audio with direct sound.
Sound Forge - If you download the "Script Developers Kit" there are examples for C# in the scripts folder that should get you started. I believe this particular tool is more focused on editing and effects but I am guessing there should be automation for recording.
To just record audio in real time, any API will be fine. Note that WASAPI is the primary API (since Vista), and legacy APIs like WaveIn API, DirectSound are implemented on top of WASAPI as compatibility layers.
Regular APIs assume you are okay to certain processing latency/overhead, on the order of tens of milliseconds.
If you are going to be faster than this, and you need real time performance, such as to process data and return in back for playback as soon as possible, you need so called exclusive mode streams, where you can achieve latencies on the order of a few milliseconds, which is on par to professional audio development kits.
Windows SDK has a few audio recording samples in \Samples\multimedia\audio (C++)
It's probably a good idea to use a third party library for that.
There's a multitude of options. The ones I know of are portaudio and STK.
I like the Fmod API which supports recording (Sound recording with FMOD library) and realtime effects.
Lately, I've been trying to setup a media center PC. I've played around with all the common media center applications like XBMC, Plex, Boxee, and WMC. But all of them have one issue or another. So I was thinking about writing my own application from scratch.
My problem is I have no experience with developing software that plays media such as videos or music. I'm also not interested in spending a huge amount of time trying to figure this out, considering all the different file formats and codecs out there. I'm really more interested in developing the database and library interface for my application and reusing someone else's control or code for actually playing the media.
One option I was thinking was to just control an existing media player externally. So for example you may browse for a video to play in my application, and then when you hit play it would fire up VideoLAN or some other popular video player.
However, I was wondering if there was an easy way to play video inside a .NET application. I'm looking for something that is capable of playing a wide variety of formats such as MKV files, and DVD ISOs. I'm more experience with WinForms, but was also thinking about using this project as an opportunity to learn WPF.
i've spent many years looking at playing video under wpf.
The short answer
There is no easy way to guarantee to be able to play a variety of formats under wpf ( mkv,dvd etc etc ) or under windows for that matter.
the long answer
If you are looking just to run this at home and not release it, install all the codecs you need and most of the formats will run via mediaelement in wpf.
Getting all the codecs to cooperate can sometimes be frustrating.
Now moving into slightly harder territory.
if you want to play DVD then you need to replace mediaelement with wpfmediakit
http://wpfmediakit.codeplex.com/
wpfmediakit gives a base library to get access to the low level directshow functionality.
There is already a code base for playing DVDs based on wpfmediakit.
Now moving onto the very hard territory.
if you want to distribute your application and have users be able to "just watch" most/all media formats means you need to be able to completely control their codecs, which generally means distributing the codecs with your package and building the directshow filter graph in code rather than let windows build it.
The easiest way is to use the existing .Net hooks to Microsoft's standard MediaPlayer:
http://msdn.microsoft.com/en-us/library/system.windows.media.mediaplayer.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/dd562851%28v=vs.85%29.aspx
was trying myself a while ago for something to play media in winforms, and found out there is vlc wrappers for .Net, dunno how good they are as i gave up, but you can try
here is one them:
http://vlcdotnet.codeplex.com/
Thanks for all the great answers. But just found out that VLC can actually be controlled through HTTP. So I think I'm just going to use that to point an instance of VLC running with the HTTP interface at whatever file I want to play.
I need to take an audio file recorded by a man and convert it to a child's voice and mix it with a background voice track.
I have searched the internet trying to find a good program to do this but I didn't find it. Is there a C# API that can help me to implement it myself?
NAudio is .NET based and has a mixing engine - might be worth looking at for your purposes.
I think using a API and writing it yourself may fall under the "too difficult to do so" category. I would recommend using a free multi-track audio editor like Audacity, it has a pitch shift ability (that will do your child voice requirement) and you can play two files on top of each other (to do the background voice requirement)
If you still want/need a API the key word to look for when searching for the child's voice ability is "Pitch shifting"
There is some new functionality in System.Windows.Media namespace using MediaPlayer class with WPF.
But what you are doing requires signal processing and best done in C++. I do not know any good Signal Processing libraries in .NET but Emgu openCV is wrapper around OpenCV which can do advanced signal processing.
What you are trying to do requires advanced signal processing and if you do not have such a background there is no easy way to do it.
Rubber Band Library by Breakfast Quay is a C++ library (released under GPL) that can change the pitch of a recording without changing the speed. It also features formant processing, which can help with changing a voice between a man/woman/child.
See http://rubberbandaudio.com/.
Requirements
I am developing a music game that requires access to the audio line-in and classes to help me analyze a MIDI file (playing the MIDI is NOT necessary for me). Secondly, I need a graphics engine that allows easy and quick development (within reason). The game's focus is not cutting edge graphics - think along the lines of Audiosurf.
Issue 1
Java provides easy to use and well documented Audio line-in input and MIDI file support built right into the API that I could not find with C#. I found some resources to read from the line-in and MIDI helper classes but don't have much documentation/support and seem to be workarounds to a lack of support by C#.
Issue 2
The second aspect of the game is of course the graphics engine. On the C# side, XNA seems to be the clear choice for my needs. On the Java side, I'm leaning towards JMonkeyEngine (or ogre4j as a second choice). JMonkeyEngine seems to be fine for my graphical uses but the documentation is scattered and sparse.
Deciding
Both issues are of equal importance. Also, I know the community here is prominently .NET programmers, so try to consider both languages if possible.
Use processing, http://www.processing.org/
It seems that you for now mostly want to test a see if your concept actually can be done/(is cool)
Processing is more or less made for this sort of things, audio and visual programmatic sketchpad. You can with very little code see if your ideas stands the way you want.
It's a subset of java so you could use java inside or outside depending on some factors.
Yes, you could use some .net, XNA/WPF or whatever but too me that seems premature.
Test you ideas first.
For the .NET and audio side of things, I have written some code to read and write MIDI files and included it as part of NAudio. Have a look at MIDI File Mapper for an example of how to make use of this. NAudio also includes the capability to capture microphone input.
Does XNA provide a means of audio input from the line-in? I looked at the MSDNA website but can't find anything on audio input. If it is indeed possible, a snippet of code or a tutorial website would be great.
Edit:
I need to do buffered reads from the audio-line in. I'm not so much interested in the implementation but rather if it has low latency.
Also development will be implemented into a PC only game.
I think all sound files need to be compiled by XACT before they can be used in XNA.
So either you get hold of DirectSound and look at the sample in:
\Samples\Managed\DirectSound\CaptureSound
...or you could interop with winmm.dll. This guy has made a small example of how to do it:
http://www.codeproject.com/KB/audio-video/cswavrec.aspx
And this guy writes some more about enumerating all sound recording devices:
http://www.codeproject.com/KB/cs/Enum_Recording_Devices.aspx
Hope it helps!
Edit:
I'm not sure what you want to do with your audio stream so this tutorial might be of interest as well:
http://nyxtom.vox.com/library/post/recording-audio-in-c.html
Edit 2:
What he said (in the comment)
|
|
V
If you're looking at doing a Windows only project, you could certainly capture the audio coming in with code from outside the XNA framework and play it back with the same. Because of how the XNA content manager works, you wouldn't be able to use the regular playback methods because the content manager translates everything into .xnb files at compile time and reads them from there. Nothing keeping you from playing using standard windows API calls though. You wouldn't really have an XNA project at that point, but I don't suppose the distinction is all that important since you're not looking to be compatible with the other platforms anyway.
To answer your question, no, you can't access audio line-in through the XNA APIs. You'd have to look at some other library such as Port Audio that would give you access to features like that. But then you'd be restricted to running on windows (ie. no xbox or zune).
disclaimer: not sure if port audio specifically has this functionality as I just found it quickly via google. Was just trying to illustrate that you'd have to use some other API.