I'm programming small program to output generated sound.
My sound card is capable of a 48000 or even 192000 sample rate. Its a Realtek ALC883 7.1+2 Channel High Definition Audio, and the specs can be found here.
However, DirectSound's MaxSampleRate has a maximum value of 20000?
I know I can get better than the maximum from my sound card, but how do I configure DirectSound to take advantage of this? When I try the following:
DirectSound ds = new DirectSound(DirectSound.GetDevices().First().DriverGuid);
MessageBox.Show(ds.Capabilities
.MaxSecondarySampleRate
.ToString(CultureInfo.InvariantCulture));
In message box the number displayed is "20000".
It could be that your sound card is not the first device in the devices list (for example, a video card with a tv outpout would appear in the list). You should look at the DeviceInformation.Description property. Otherwise, maybe a problem with the driver ?
Related
I'm setting up a system using the GHI Electronics FEZ board(like an Arduino). I have created a way to send a frequency to a speaker and create a tone. Which allowed me to make music. Yet I can not figure out how to create tones that get a speaker to "talk" in a sense.
I have tried a text to speech nuget package that already exists yet it would not allow me to install it within the TinyCLR programming I am using.
C# using a Pulse Width Modulation controller I can control the desired frequency
public static PwmController fezPWM = PwmController.FromName(FEZ.PwmChannel.Controller3.Id);
static PwmChannel MusicSound = fezPWM.OpenChannel(FEZ.PwmChannel.Controller3.D11);
MusicSound.Start();
fezPWM.SetDesiredFrequency(F4s);
MusicSound.Stop();
The above part allows me to start and stop the sound and setting it to any desired frequency.
I expect it to use of desired frequencies in context to create a word. Yet all I get are different tones.
I'm testing out the new unified speech engine on Azure, and I'm working on a piece where I'm trying to transcribe a 10 minute audio file. I've created a recognizer with CreateSpeechRecognizerWithFileInput, and I've kicked off continuous recognition with StartContinuousRecognitionAsync. I created the recognizer with detailed results enabled.
In the FinalResultsReceived event, there doesn't seem to be a way to access the audio offset in the SpeechRecognitionResult. If I do this though:
string rawResult = ea.Result.ToString(); //can get access to raw value this way.
Regex r=new Regex(#".*Offset"":(\d*),.*");
int offset=Convert.ToInt32(r?.Match(rawResult)?.Groups[1]?.Value);
Then I can extract the offset. The raw result looks something like this:
ResultId:4116b361141446a98f306fdc11c3a5bd Status:Recognized Recognized text:<OK, so what's your think it went well, let's look at number number is 104-828-1198.>. Json:{"Duration":129500000,"NBest":[{"Confidence":0.887861133,"Display":"OK, so what's your think it went well, let's look at number number is 104-828-1198.","ITN":"OK so what's your think it went well let's look at number number is 104-828-1198","Lexical":"OK so what's your think it went well let's look at number number is one zero four eight two eight one one nine eight","MaskedITN":"OK so what's your think it went well let's look at number number is 104-828-1198"}],"Offset":6900000,"RecognitionStatus":"Success"}
The challenge there is that the Offset is sometimes zero, even for cases where it's a nonzero file index, so I'll get zeroes in the middle of a recognition stream.
I also tried submitting the same file through the batch transcription API, which gives me a different result entirely:
{
"RecognitionStatus": "Success",
"Offset": 531700000,
"Duration": 91300000,
"NBest": [{
"Confidence": 0.87579143,
"Lexical": "OK so what's your think it went well let's look at number number is one zero four eight two eight one",
"ITN": "OK so what's your think it went well let's look at number number is 1048281",
"MaskedITN": "OK so what's your think it went well let's look at number number is 1048281",
"Display": "OK, so what's your think it went well, let's look at number number is 1048281."
}
]
},
So I have three questions on this:
Is there a supported method to get the offset of a recognized section of a file in the recognizer API? The SpeechRecognitionResult doesn't expose this, nor does the Best() extension.
Why is the offset coming back as 0 for a segment part way through the file?
What are the units for the offsets in the bulk recognition and file recognition APIs, and why are they different? They don't appear to be ms or frames, at least from what I've found in Audacity. The result I posted was from roughly 59s into the file, which is roughly 800k samples.
Chris,
Thanks for your feedback. To your questions,
1) The offset as well as duration have been added to the API. The next coming release (very soon) will allow you access both properties. Please stay tuned.
2) This is probably due to different recognition mode being used. We will also fix that in the next release.
3) The time unit for both API is 100ns(tick). Please also note that batch transcription uses different model than online recognition, so that the recognition result might be slightly different.
Sorry for the inconvenience!
Thanks,
Audio is clipping (or clicking) when trying to lower the volume of a WAV file in real time.
I've tried it on a SampleChannel, VolumeSampleProvider and WaveChannel32 instance, the source being a 32bit WAV file.
If I try it on a WaveOut instance, the clipping doesn't occur anymore, but I don't want that because it lowers the volume of all sounds in the application.
And this only happens when I lower volume, rising it doesn't cause clipping.
Is this a known problem or am I supposed to approach this differently?
Note: this is how the volume drops in real time over the given time span:
0.9523049
0.9246111
0.9199954
0.89384
0.8676848
0.8415294
0.8169126
0.7907572
0.7646018
0.739985
0.7122912
0.6892129
0.6630576
0.6369023
0.6122856
0.5861301
0.5599748
0.535358
0.5092026
0.4830474
0.456892
0.4322752
0.4061199
0.3799645
0.3553477
0.3276539
0.3030371
0.2784202
0.2522649
0.2261095
0.2014928
0.176876
0.149182
0.1245652
0.09841
0.07225461
0.04763785
0.02148246
0
Apparently it is a DesiredLatency and NumberOfBuffers issue on the WaveOut instance. The default values cause the problem, altered values fix it.
These are the values I used to fix the issue.
WaveOutDevice = new WaveOut(WaveCallbackInfo.NewWindow())
{
DesiredLatency = 700, //! adjust the value to prevent sound clipping/clicking
NumberOfBuffers = 3, //! number of buffers must be carefully selected to prevent audio clipping/clicking
};
What I need to do is to get the audio stream playing on my speakers, without any additional hardware.
If there is a speakers output (say a human voice) then I need to display some images. So How can i determine whether there is a sound coming out of the speakers??
I want to use C# for this on windows 7.
Thank you.
You can do this with WASAPI Loopback Capture. My open source NAudio library includes a wrapper for this called WasapiLoopbackCapture. One quirk of WASAPI Loopback Capture is that you get no callbacks whatsoever when the system is playing silence, although that might not matter for you
If you don't actually need to examine the values of the samples, WASAPI also allows you to monitor the volume level of a device. In NAudio you can access this with AudioMeterInformation or AudioEndpointVolume on the MMDevice (you can get this with MMDeviceEnumerator.GetDefaultAudioEndpoint for rendering)
You can use CSCore which allows you to get the peak of any applications and of the whole device. You can determine whether sound is getting played by checking that peak value. This is an example on how to get the peak of an application. And these are two examples how to get the peak of one particular device:
[TestMethod]
[TestCategory("CoreAudioAPI.Endpoint")]
public void CanGetAudioMeterInformationPeakValue()
{
using (var device = Utils.GetDefaultRenderDevice())
using (var meter = AudioMeterInformation.FromDevice(device))
{
Console.WriteLine(meter.PeakValue);
}
}
[TestMethod]
[TestCategory("CoreAudioAPI.Endpoint")]
public void CanGetAudioMeterInformationChannelsPeaks()
{
using (var device = Utils.GetDefaultRenderDevice())
using (var meter = AudioMeterInformation.FromDevice(device))
{
for (int i = 0; i < meter.MeteringChannelCount; i++)
{
Console.WriteLine(meter[i]);
}
}
}
Just check whether there is a peak bigger than zero or something like 0.05 (you may need to experiment with that). If the peak is bigger than a certain value, there is any application playing something.
Also take a look at this: http://cscore.codeplex.com/SourceControl/latest#CSCore.Test/CoreAudioAPI/EndpointTests.cs. To get get implementation of Utils.GetDefaultRendererDevice see take a look at this one: http://cscore.codeplex.com/SourceControl/latest#CSCore.Test/CoreAudioAPI/Utils.cs
The first example gets the average peak of all channel peaks and the second example gets the peaks of each channel of the output device.
I would like to get accelerometer readings every 10ms from my Windows Phone 8, but instead I observe some jitter: the spacing between readings will be 8,10,12,9, or the like. So approximately 10, but not exactly.
I was wondering whether someone could suggest a way to get more reliable readings.
The core of my code looks like this:
var accelerometer = Windows.Devices.Sensors.Accelerometer.GetDefault();
accelerometer.ReadingChanged += accelerometer_ReadingChanged;
accelerometer.ReportInterval = 10;
The phone reports a MiminumReportingInterval of 10, so that should be fine. My callback just adds the numbers to a list, which I will send over the network at the end.
I am looking at the time in AccelerometerReadingChangedEventArgs.Timestamp, and that's where I see that the interval isn't always 10ms. Here's what the times looked like in the latest measurements: 105,118,128,134,146,157,163,177,187,198,208,213,232,238,245,255,263,279,285,295,303,313,324,334,345,355,363,375,385
So: is there something I can do to get more precisely spaced measurements? Or is this just the best this particular hardware can do?
There is a great article on Windows Phone Developer Blog which covers accelerometer in details.
One of the points of the article is that, yes, the stream of values can and most probably will be 'jittery' so you should implement some method of filtering. One such method is a low pass filter.
The smoother the data after filtering, the bigger the delay will be between the actual change and the reading. In other words, if you used accelerometer in a game as a 'steering wheel', a lot of filtering will result in late turning of a car, but no filtering will probably result in a jittery car. So, the best is to set it somewhere in between, depending on the use case.