Audio Encoding conversion problems with PCM 32-bit yo PCM 16-bit - c#

I am using C# in Universal Windows App to write a Watson Speech-to-text service.
For now instead of using the Watson service, I write to the file and then read it in the Audacity to confirm it is in the right format since Watson service wasn't returning correct responses to me, and the following explains why.
For some reason when I create 16-bit PCM encoding properties, and read buffer, I am only able to read data as 32-bit PCM, and it's working well, but if I read it in 16-bit PCM it is in slow motion, and all the speech is basically corrupted.
I don't really know what exactly needs to be done to convert from 32-bit to 16-bit, but here's what I have in my C# application:
//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
new AudioGraphSettings(AudioRenderCategory.Speech)
{
DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
AudioRenderCategory = AudioRenderCategory.Speech,
EncodingProperties = pcmEncoding
}
);
graph = result.Graph;
//Initialize microphone
var microphone = await DeviceInformation.CreateFromIdAsync(MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default));
var micInputResult = await graph.CreateDeviceInputNodeAsync(MediaCategory.Speech, pcmEncoding, microphone);
//Create frame output node
frameOutputNode = graph.CreateFrameOutputNode(pcmEncoding);
//Callback function to fire when buffer is filled with data
graph.QuantumProcessed += (s, a) => ProcessFrameOutput(frameOutputNode.GetFrame());
frameOutputNode.Start();
//Make the microphone write into the frame node
micInputResult.DeviceInputNode.AddOutgoingConnection(frameOutputNode);
micInputResult.DeviceInputNode.Start();
graph.Start();
Initialization step is done at this stage. Now, actually reading from the buffer and writing to the file is only working if I use 32-bit PCM encoding with the following function (commented out is the PCM 16-bit code that is resulting in a slow motion speech output):
private void ProcessFrameOutput(AudioFrame frame)
{
//Making a copy of the audio frame buffer
var audioBuffer = frame.LockBuffer(AudioBufferAccessMode.Read);
var buffer = Windows.Storage.Streams.Buffer.CreateCopyFromMemoryBuffer(audioBuffer);
buffer.Length = audioBuffer.Length;
using (var dataReader = DataReader.FromBuffer(buffer))
{
dataReader.ByteOrder = ByteOrder.LittleEndian;
byte[] byteData = new byte[buffer.Length];
int pos = 0;
while (dataReader.UnconsumedBufferLength > 0)
{
/*Reading Float -> Int 32*/
/*With this code I can import raw wav file into the Audacity
using Signed 32-bit PCM Encoding, and it is working well*/
var singleTmp = dataReader.ReadSingle();
var int32Tmp = (Int32)(singleTmp * Int32.MaxValue);
byte[] chunkBytes = BitConverter.GetBytes(int32Tmp);
byteData[pos++] = chunkBytes[0];
byteData[pos++] = chunkBytes[1];
byteData[pos++] = chunkBytes[2];
byteData[pos++] = chunkBytes[3];
/*Reading Float -> Int 16 (Slow Motion)*/
/*With this code I can import raw wav file into the Audacity
using Signed 16-bit PCM Encoding, but when I play it, it's in
a slow motion*/
//var singleTmp = dataReader.ReadSingle();
//var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
//byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
//byteData[pos++] = chunkBytes[0];
//byteData[pos++] = chunkBytes[1];
}
WriteBytesToFile(byteData);
}
}
Can anyone think of a reason why this is happening? Is it because Int32 PCM is larger in size and when I use Int16, it extends it and makes the sound longer? Or am I not sampling it properly?
Note: I tried reading Bytes directly from the buffer, and then using that as a raw data, but it's not encoded as PCM that way.
Reading Int16/32 from the buffer directly also doesn't work.
In the above example I am only using Frame Output node. IF I create a file output node that automatically writes to the raw file, it works really well as 16-bit PCM, so something is wrong in my callback function that causes it to be in a slow motion.
Thanks

//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
new AudioGraphSettings(AudioRenderCategory.Speech)
{
DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
AudioRenderCategory = AudioRenderCategory.Speech,
EncodingProperties = pcmEncoding
}
);
graph = result.Graph;
pcmEncoding does not make much sense here since only Float encoding is supported by AudioGraph.
byte[] byteData = new byte[buffer.Length];
it should be buffer.Length / 2 since you convert from float data with 4 bytes per sample to int16 data with 2 bytes per sample
/*Reading Float -> Int 16 (Slow Motion)*/
/*With this code I can import raw wav file into the Audacity
using Signed 16-bit PCM Encoding, but when I play it, it's in
a slow motion*/
var singleTmp = dataReader.ReadSingle();
var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
byteData[pos++] = chunkBytes[0];
byteData[pos++] = chunkBytes[1];
This is correct code, it should work. Your "slow motion" is most likely related to the buffer size you incorrectly set before.
I must admit Microsoft needs someone to review their bloated APIs

Related

NAudio playing 32 bit float IEE

I have an application which is generating 32 bit floats (big endian). If I write these to a file and then open them in Audacity the file plays correctly.
I am trying to play the stream using NAudio. If I create a WaveFormat of 24k samples, 32 bit and 2 channels I do hear noise although as it's the wrong format the stream isn't rendering of course, if I create the correct format (IeeeFloatWave) then I don't hear anything at all. I know the samples are arriving correctly as I can save them to disk but I just can't play them. Anybody see what I'm doing that is wrong?
Updated and solved - little to big endian change is in the feeding routine
FloatTo16 bit provider converts format before playing
private bool _streamActive = false;
private BufferedWaveProvider bufferedWaveProvider = null;
private WaveFloatTo16Provider waveFloatTo16Provider = null;
private WaveOut waveOut = null;
private WaveFormat waveFormat=null;
// ProcessSound is fed incoming byte packets
// (28 bytes header plus 1024 bytes audio)
// by background thread
// Data is converted from little to big endian in background thread
public void ProcessSound(byte[] rxData)
{
// get data length
int datalen = rxData.Length;
// check to activate player
if (_streamActive == true)
{
// add samples to buffer ('28' allows for header information)
bufferedWaveProvider.AddSamples(rxData, 28, datalen - 28);
return;
}
// start it going
waveFormat = WaveFormat.CreateIeeeFloatWaveFormat(24000, 2);
// create buffer to allow samples to be added
bufferedWaveProvider = new BufferedWaveProvider(waveFormat);
// convert from 32 bit float to 16 bit PCM
waveFloatTo16Provider = new WaveFloatTo16Provider(bufferedWaveProvider);
// add samples to buffer
bufferedWaveProvider.AddSamples(rxData, 28, datalen - 28);
// create waveOut player
waveOut = new WaveOut();
waveOut.Init(waveFloatTo16Provider);
waveOut.Volume = 0.25f;
waveOut.Play();
// mark stream is active
_streamActive = true;
}

How to know if audio is going dead in Naudio C#`

I am trying to find out when the song is going dead (inaudible sound for few seconds). I am using Naudio library in C#. Till now i am able to get the PCM data and plot the amplitude of the audio. I am guessing the dead audio through this amplitude i am obtaining. But i am bit confused about audio channels. Following is the piece of code i wrote.
NAudio.Wave.WaveChannel32 wave = new NAudio.Wave.WaveChannel32(new NAudio.Wave.WaveFileReader(open.FileName));
int songLength = (int)wave.Length;
byte[] songPCM = new byte[songLength];
int sampleRate = (int)wave.WaveFormat.SampleRate;
int bitsPerSample = (int)wave.WaveFormat.BitsPerSample;
int numChannels = (int)wave.WaveFormat.Channels;
wave.Read(songPCM, 0, songLength);
double[] _waveLeft = new double[songLength / 8];
double[] _waveRight = new double[songLength / 8];
System.IO.StreamWriter fileoutLeft = new System.IO.StreamWriter("E:\\LOutputSongPCM.dat", true);
System.IO.StreamWriter fileoutRight = new System.IO.StreamWriter("E:\\ROutputSongPCM.dat", true);
int h = 0;
for (int i = 0; i < songLength; i += 8)
{
_waveLeft[h] = (double)BitConverter.ToSingle(songPCM, i);
_waveRight[h] = (double)BitConverter.ToSingle(songPCM, i + 4);
chart1.Series["wave"].Points.Add(_waveLeft[h]);
//chart1.Series["wave"].Points.Add(_waveRight[h]);
fileoutLeft.WriteLine(_waveLeft[h]);
fileoutRight.WriteLine(_waveRight[h]);
h++;
}
fileoutLeft.Close();
fileoutRight.Close();
Now for this piece of code i know the audio is 2 channel. So i referred many links and threads and got confused if i am reading my pcm data for each channel correctly. However i compared the plots of each channel and they look good(Matching with original song) but i am not sure about their accuracy. Can you guide me to get the exact raw data for any channel. for mono, stereo and 5.1.
Thanks.
You'll find it easier to get at the samples by using the AudioFileReader class, whose Read method takes a float array. The samples are stored interleaved for multi-channel audio, so for stereo, you'll get left sample, then right sample, then another left and so on.

Problems with playing AAC Raw data on Silverlight (Windows Phone)

I need to play AAC LC audio that comes from live stream.
To achive that i've implemented MediaStreamSource.
When i receive first packets of stream, i set MediaElement's source to my MediaStreamSource.
It seems that everything works fine: OpenMediaAsync is called -> reported with ReportOpenMediaCompleted, then GetSampleAsync is called -> reported with ReportGetSampleCompleted, BUT, on 10th call of GetSampleAsync, ReportGetSampleCompleted is throws NullReferenceException.
Here is my CodecPrivateData:
var waveFormat = new AACWaveFormat();
waveFormat.FormatTag = 0xFF;
waveFormat.Channels = 2; // For my stream is always stereo
waveFormat.Frequency = 44100; //For my stream is always 44Khz
waveFormat.BitsPerSample = 16; //For my stream is always 16bit
waveFormat.BlockAlign = waveFormat.Channels * waveFormat.BitsPerSample / 8; //is this true formula?
waveFormat.AverageBytesPerSecond = waveFormat.Frequency * waveFormat.BlockAlign; //is this true formula? because usually this value is 176400 or 1411Kbps is this real value for sound?
waveFormat.ExtraDataSize = 2; //usually, but i read these values from first packet of stream
waveFormat.ExtraData = AudioSpecificConfig; //AudioSpecificConfig usually 2 bytes length, readed from stream.
First packet of the stream is always AACSequenceHeader - where i read my CodecPrivateData and AudioSpecificConfig. All the rest is AACRaw.
My CodecPrivateData is looks like
FF00020044AC000010B102000400100002001210.
My GetSampleAsync
protected override void GetSampleAsync(MediaStreamType mediaStreamType)
{
var audioStreamDescription = new MediaStreamDescription(MediaStreamType.Audio, AudioStreamAttibutes); //AudioStreamAttibutes is field that contains data filled on OpenMediaAsync step.
//using (var memoryStream = new MemoryStream(AudioPackets[0].Data))
var memoryStream = new MemoryStream(AudioPackets[0].Data);
ReportGetSampleCompleted(new MediaStreamSample(audioStreamDescription, memoryStream, 0, AudioPackets[0].Data.Length, TimeSpan.FromMilliseconds(GetAudioSampleCalls++ * 32).Ticks, new Dictionary<MediaSampleAttributeKeys, String>())); //throws NullReferenceException, when debugger stops i can be see that all passed params is not null!
}
The problem here is that i don't know any timestamp and i don't know whether this could be a problem.
And finally what is Data field? Data field contains all received RawAAC audio as Byte[] that i extract from AudioTag. (See E.4.2.2 AACAUDIODATA at http://download.macromedia.com/f4v/video_file_format_spec_v10_1.pdf)

Playing a sound from a generated buffer in a Windows 8 app

I'm porting some C# Windows Phone 7 apps over to Windows 8.
The phone apps used an XNA SoundEffect to play arbitrary sounds from a buffer. In the simplest cases I'd just create a sine wave of the required duration and frequency. Both the duration and frequency can vary greatly, so I'd prefer not to rely on MediaElements (unless there is someway to shift the frequency of a base file, but that will only help me with the single frequency generation).
What is the equivalent of an XNA SoundEffectInstance in WinRT?
I assume I'll need to use DirectX for this, but I'm not sure how to go about this from an otherwise C#/XAML app. I've had a look at SharpDX, but it didn't seem to have the DirectSound, SecondaryBuffer, SecondaryBuffer classes that I assume I'd need to use.
I've made a number of assumptions above. It may be I'm looking for the wrong classes or there is an entirely separate way to generate arbitrary sound from a Windows 8 app.
I found an example using XAudio2 from SharpDX to play a wav file via an AudioBuffer. This seems promising, I'd just need to substitute my generated audio buffer for the native file stream.
PM> Install-Package SharpDX
PM> Install-Package SharpDX.XAudio2
public void PlaySound()
{
XAudio2 xaudio;
MasteringVoice masteringVoice;
xaudio = new XAudio2();
masteringVoice = new MasteringVoice(xaudio);
var nativefilestream = new NativeFileStream(
#"Assets\SpeechOn.wav",
NativeFileMode.Open,
NativeFileAccess.Read,
NativeFileShare.Read);
var soundstream = new SoundStream(nativefilestream);
var waveFormat = soundstream.Format;
var buffer = new AudioBuffer
{
Stream = soundstream.ToDataStream(),
AudioBytes = (int)soundstream.Length,
Flags = BufferFlags.EndOfStream
};
var sourceVoice = new SourceVoice(xaudio, waveFormat, true);
// There is also support for shifting the frequency.
sourceVoice.SetFrequencyRatio(0.5f);
sourceVoice.SubmitSourceBuffer(buffer, soundstream.DecodedPacketsInfo);
sourceVoice.Start();
}
The only way to generate dynamic sound in Win8RT is to use XAudio2, so you should be able to do this with SharpDX.XAudio2.
Instead of using NativeFileStream, just instantiate a DataStream directly giving your managed buffer (or you can use an unmanaged buffer or let DataStream instantiate one for you). The code would be like this:
// Initialization phase, keep this buffer during the life of your application
// Allocate 10s at 44.1Khz of stereo 16bit signals
var myBufferOfSamples = new short[44100 * 10 * 2];
// Create a DataStream with pinned managed buffer
var dataStream = DataStream.Create(myBufferOfSamples, true, true);
var buffer = new AudioBuffer
{
Stream = dataStream,
AudioBytes = (int)dataStream.Length,
Flags = BufferFlags.EndOfStream
};
//...
// Fill myBufferOfSamples
//...
// PCM 44.1Khz stereo 16 bit format
var waveFormat = new WaveFormat();
XAudio2 xaudio = new XAudio2();
MasteringVoice masteringVoice = new MasteringVoice(xaudio);
var sourceVoice = new SourceVoice(xaudio, waveFormat, true);
// Submit the buffer
sourceVoice.SubmitSourceBuffer(buffer, null);
// Start playing
sourceVoice.Start();
Sample method to fill the buffer with a Sine wave:
private void FillBuffer(short[] buffer, int sampleRate, double frequency)
{
double totalTime = 0;
for (int i = 0; i < buffer.Length - 1; i += 2)
{
double time = (double)totalTime / (double)sampleRate;
short currentSample = (short)(Math.Sin(2 * Math.PI * frequency * time) * (double)short.MaxValue);
buffer[i] = currentSample; //(short)(currentSample & 0xFF);
buffer[i + 1] = currentSample; //(short)(currentSample >> 8);
totalTime += 2;
}
}
You can also use WASAPI to play dynamically-generated sound buffers in WinRT. (xaudio2 isn't the only solution).
I wrote sample code for it in VB here (the C# will be essentially the same):
http://www.codeproject.com/Articles/460145/Recording-and-playing-PCM-audio-on-Windows-8-VB
I believe that the NAudio guy is planning to translate+incorporate my sample code into NAudio, for a Win8-supported version, so that'll be easier to use.

Use NAudio to get Ulaw samples for RTP

I've been looking over the NAudio examples trying to work out how I can get ulaw samples suitable for packaging up as an RTP payload. I'm attempting to generate the samples from an mp3 file using the code below. Not surprisingly, since I don't really have a clue what I'm doing with NAudio, when I transmit the samples across the network to a softphone all I get is static.
Can anyone provide any direction on how I should be getting 160 bytes (8Khz # 20ms) ULAW samples from an MP3 file using NAudio?
private void GetAudioSamples()
{
var pcmStream = WaveFormatConversionStream.CreatePcmStream(new Mp3FileReader("whitelight.mp3"));
byte[] buffer = new byte[2];
byte[] sampleBuffer = new byte[160];
int sampleIndex = 0;
int bytesRead = pcmStream.Read(buffer, 0, 2);
while (bytesRead > 0)
{
var ulawByte = MuLawEncoder.LinearToMuLawSample(BitConverter.ToInt16(buffer, 0));
sampleBuffer[sampleIndex++] = ulawByte;
if (sampleIndex == 160)
{
m_rtpChannel.AddSample(sampleBuffer);
sampleBuffer = new byte[160];
sampleIndex = 0;
}
bytesRead = pcmStream.Read(buffer, 0, 2);
}
logger.Debug("Finished adding audio samples.");
}
Here's a few pointers. First of all, as long as you are using NAudio 1.5, no need for the additional WaveFormatConversionStream - Mp3FileReader's Read method returns PCM.
However, you will not be getting 8kHz out, so you need to resample it first. WaveFormatConversionStream can do this, although it uses the built-in Windows ACM sample rate conversion, which doesn't seem to filter the incoming audio well, so there could be aliasing artefacts.
Also, you usually read bigger blocks than just two bytes at a time as the MP3 decoder needs to decode frames one at a time (the resampler also will want to deal with bigger block sizes). I would try reading at least 20ms worth of bytes at a time.
Your use of BitConverter.ToInt16 is correct for getting the 16 bit sample value, but bear in mind that an MP3 is likely stereo, with left, right samples. Are you sure your phone expects stereo.
Finally, I recommend making a mu-law WAV file as a first step, using WaveFileWriter. Then you can easily listen to it in Windows Media Player and check that what you are sending to your softphone is what you intended.
Below is the way I eventually got it working. I do lose one of the channels from the mp3, and I guess there's some way to combine the channels as part of a conversion, but that doesn't matter for my situation.
The 160 byte buffer size gives me 20ms ulaw samples which work perfectly with the SIP softphone I'm testing with.
var pcmFormat = new WaveFormat(8000, 16, 1);
var ulawFormat = WaveFormat.CreateMuLawFormat(8000, 1);
using (WaveFormatConversionStream pcmStm = new WaveFormatConversionStream(pcmFormat, new Mp3FileReader("whitelight.mp3")))
{
using (WaveFormatConversionStream ulawStm = new WaveFormatConversionStream(ulawFormat, pcmStm))
{
byte[] buffer = new byte[160];
int bytesRead = ulawStm.Read(buffer, 0, 160);
while (bytesRead > 0)
{
byte[] sample = new byte[bytesRead];
Array.Copy(buffer, sample, bytesRead);
m_rtpChannel.AddSample(sample);
bytesRead = ulawStm.Read(buffer, 0, 160);
}
}
}

Categories