Capture speaker output - c#

I've seen that I can capture the microphone and sound-files with elements in the Windows.Media.Audio namespace. I'm looking to capture the speaker output, though. For example, I click on something and the system sounds the alert sound - I want to be able to capture that.
Is there any way of doing that using elements in Windows.Media.Audio (instead of going more low level into Win32 calls)?

Well, even with "low level Win32 calls", you can't do any loopback recording in UWP.
This is traditionally done with WASAPI, but due to the sandboxed execution of universal applications, you can't open any capture streams on audio render devices in WASAPI.
In general, if you dive into COM APIs which have been ported to UWP, you will notice that there have been a lot of restrictions.

Related

Recording sound in real time with c++ or C#

I have a professional sound card, and I want to record the signals from the guitar with c++ or c# for developing guitar effects in real time.
How can i record in real time through a c++ method ?
Is it mean that I need the sound card API ?
this one is enough?
Although may not be as easy as using a pre-built library, you may be able to get a C++ SDK for your sound card from the manufacturer. I would start by browsing their site or contacting support.
If that isn't an option, you can also use DirectSound which is part of the DirectX family of products. The learning curve is fairly steep but I believe it should do just about anything you want.
One final option is to look at a favorite tool (such as sound forge). A number of these tools support automation which means you can click through the app, decide what you want, then automate that sequence of events (See this as an example).
Hope that helps, best of luck!
Side Note: I have developed a number of hardware interfaces and in my experience its best to start with an example that does at least something like what you are looking for, then modify the code from there. If any particular option doesn't have an example like this I would probably skip it in favor of an example that does.
Examples
Direct Sound - Microsoft has a learning site for direct sound which you can find here. I also found this blog article which has an example for recording audio with direct sound.
Sound Forge - If you download the "Script Developers Kit" there are examples for C# in the scripts folder that should get you started. I believe this particular tool is more focused on editing and effects but I am guessing there should be automation for recording.
To just record audio in real time, any API will be fine. Note that WASAPI is the primary API (since Vista), and legacy APIs like WaveIn API, DirectSound are implemented on top of WASAPI as compatibility layers.
Regular APIs assume you are okay to certain processing latency/overhead, on the order of tens of milliseconds.
If you are going to be faster than this, and you need real time performance, such as to process data and return in back for playback as soon as possible, you need so called exclusive mode streams, where you can achieve latencies on the order of a few milliseconds, which is on par to professional audio development kits.
Windows SDK has a few audio recording samples in \Samples\multimedia\audio (C++)
It's probably a good idea to use a third party library for that.
There's a multitude of options. The ones I know of are portaudio and STK.
I like the Fmod API which supports recording (Sound recording with FMOD library) and realtime effects.

Should I use DirectSound or WASAPI for my audio project?

I am starting a project where minimum requirements will be Windows 7. I'll be using NAudio as my interface to audio. I am not sure what I should be using: DirectSound or WASAPI? I am going to be doing the following:
Manipulating volume/mute on multiple USB sound cards for both speaker and the microphone.
Rerouting input from sound card 2 into the output of the sound card 2 (if that's possible).
Manipulating the audio input of the sound card with some effects.
I understand that behind the scenes DirectSound processes all the audio via WASAPI anyway and it sounds like DirectSound has joined the list of deprecated technologies.
However, my question is more from a functional level: which API will let me do what I described above.
where minimum requirements will be Windows 7
Certainly WASAPI - you have better control over things, WASAPI interfaces/API are well made and easier to use, less overhead if you need to be close to real time. There is nothing on th elist that DirectSound can give you and WASAPI can not.
The only reason to use DirectSound if you need pre-Vista systems where WASAPI was just not available.

playback audio to a non-default playback device in .net

How can I playback audio to a non-default playback device in .net? Help would be wonderful! Audio playback to the default playback device is easy, however machines can have multiple playback devices for many reasons, and many common application allow selecting a non-default device for playback and recording. Is there a way to do this hopefully avoiding pinvoke? media foundation or core audio? Thank you in advance.
KindReality,
This might be useful to you. NAudio has seemingly already wrapped up a few API's for you (and, I imagine, handles those low level calls so you don't have to). Scrolling down to "NAudio Features" will most likely reveal whether or not this is what you're looking for.

C# Speech Recognition from System Audio (Speaker Sound)

I've seen speech recognition from input devices (obviously) and I've seen speech recognition from files (http://gotspeech.net/forums/thread/6835.aspx). However, I was wondering whether it would be possible to run speech recognition on system audio in real time. By system audio, the sound that comes out of your speakers.
It would be a great tool for those who are hard of hearing, as they are watching YouTube videos, the C# Application could transcribe what's being said.
How could I go about doing this?
Very easily - Go to the sound mixer, choose input and enable/unmute "Stereo Mix". You should, of course, mute the mic if you don't want to record that too. Then, just start recording the same way you'd record the mic - now you'll get the same feed as the speakers at digital quality.
This can be done programatically although it can be fiddly - especially if you want to support WinXP as well as Vista/Win7 (Sound was overhauled in Vista and I believe the APIs are significantly different although I haven't had to use them yet).
You're almost certainly going to need to filter the sound before attempting recognition. Unless the speech recog. library you're using is designed to work in adverse conditions, music and special effects will interfere with proper recognition as will multiple people speaking at the same time.
If you haven't got a super-robust library, filters to attenuate non-vocal frequencies are going to be a must. You may also need to apply volume normalisation to account for loud/quiet scenes - There are hundreds of filters that could potentially improve matching.
You may want to access the recognition API at the lowest level to get as much control as possible - You'll need to tweak it to cope with people shouting, breathless, crying, etc... If you start designing for flexible low-level access, it will probably save you weeks if you find you need it later on and have to re-architect.
I'd suggest you look into NAudio as a starting point for audio processing
I suspect you'll be able to get something which works under ideal conditions without too much effort - but tweaking it to work well in all eventualities may be a mammoth task. That said, it sounds like a fun project.
You could improve recognition chance considerably by creating genre-, user- or show-specific dictionaries. These could either be pre-generated, or built automatically using a weighted feedback loop - perhaps also allowing the user to correct mistakes.

Capture devices - Mono c#

I'm looking for a way to list all capture devices (audio and video) with Mono under Linux: microphones, webcam, etc... but I couldn't find anything. Under windows, it's easy doing this with DirectShow, but couldn't find anything like this under Linux.
Of course I could list those devices with a system command line, and parse the string, but before doing this I wanted confirmation that nothing like this exists for Mono.
Thks for your attention.
For video, take a look at V4L. My brief search of Google for Mono bindings did not turn up anything promising, but feel free to look.
For audio, most modern Linux distributions will have PulseAudio installed, which will know about all the audio capture devices in the machine automatically, and which one the user prefers to use at the time. Again, I don't know of any Mono bindings for this either.

Categories