Use NAudio to get Ulaw samples for RTP - c#

I've been looking over the NAudio examples trying to work out how I can get ulaw samples suitable for packaging up as an RTP payload. I'm attempting to generate the samples from an mp3 file using the code below. Not surprisingly, since I don't really have a clue what I'm doing with NAudio, when I transmit the samples across the network to a softphone all I get is static.
Can anyone provide any direction on how I should be getting 160 bytes (8Khz # 20ms) ULAW samples from an MP3 file using NAudio?
private void GetAudioSamples()
{
var pcmStream = WaveFormatConversionStream.CreatePcmStream(new Mp3FileReader("whitelight.mp3"));
byte[] buffer = new byte[2];
byte[] sampleBuffer = new byte[160];
int sampleIndex = 0;
int bytesRead = pcmStream.Read(buffer, 0, 2);
while (bytesRead > 0)
{
var ulawByte = MuLawEncoder.LinearToMuLawSample(BitConverter.ToInt16(buffer, 0));
sampleBuffer[sampleIndex++] = ulawByte;
if (sampleIndex == 160)
{
m_rtpChannel.AddSample(sampleBuffer);
sampleBuffer = new byte[160];
sampleIndex = 0;
}
bytesRead = pcmStream.Read(buffer, 0, 2);
}
logger.Debug("Finished adding audio samples.");
}

Here's a few pointers. First of all, as long as you are using NAudio 1.5, no need for the additional WaveFormatConversionStream - Mp3FileReader's Read method returns PCM.
However, you will not be getting 8kHz out, so you need to resample it first. WaveFormatConversionStream can do this, although it uses the built-in Windows ACM sample rate conversion, which doesn't seem to filter the incoming audio well, so there could be aliasing artefacts.
Also, you usually read bigger blocks than just two bytes at a time as the MP3 decoder needs to decode frames one at a time (the resampler also will want to deal with bigger block sizes). I would try reading at least 20ms worth of bytes at a time.
Your use of BitConverter.ToInt16 is correct for getting the 16 bit sample value, but bear in mind that an MP3 is likely stereo, with left, right samples. Are you sure your phone expects stereo.
Finally, I recommend making a mu-law WAV file as a first step, using WaveFileWriter. Then you can easily listen to it in Windows Media Player and check that what you are sending to your softphone is what you intended.

Below is the way I eventually got it working. I do lose one of the channels from the mp3, and I guess there's some way to combine the channels as part of a conversion, but that doesn't matter for my situation.
The 160 byte buffer size gives me 20ms ulaw samples which work perfectly with the SIP softphone I'm testing with.
var pcmFormat = new WaveFormat(8000, 16, 1);
var ulawFormat = WaveFormat.CreateMuLawFormat(8000, 1);
using (WaveFormatConversionStream pcmStm = new WaveFormatConversionStream(pcmFormat, new Mp3FileReader("whitelight.mp3")))
{
using (WaveFormatConversionStream ulawStm = new WaveFormatConversionStream(ulawFormat, pcmStm))
{
byte[] buffer = new byte[160];
int bytesRead = ulawStm.Read(buffer, 0, 160);
while (bytesRead > 0)
{
byte[] sample = new byte[bytesRead];
Array.Copy(buffer, sample, bytesRead);
m_rtpChannel.AddSample(sample);
bytesRead = ulawStm.Read(buffer, 0, 160);
}
}
}

Related

Reading a MemoryStream containing opus audio with NAudio

I'm trying to play opus audio files from web which I try to buffer with a MemoryStream. I'm aware of NAudio's ability to take urls however I need to set cookies and user agent before I access the file so this eliminates that option.
My latest approach was to buffer 30~ seconds of stream, feed it to StreamMediaFoundationReader and write to the same MemoryStream when needed, however NAudio ends up playing the initial buffered segment and stop after that is completed. What would be the correct approach for this?
Here's my current code if needed. (I have no idea how audio streaming works so please go easy on me)
bufstr.setReadStream(httpreq.GetResponse().GetResponseStream()); //bufstr is a custom class which creates a memorystream I can write to.
StreamMediaFoundationReader streamread = new StreamMediaFoundationReader(bufstr.getStream());
bufstr.readToStream(); //get the initial 30~ seconds of content
waveOut.Init(streamread);
waveOut.Play();
int seconds = 0;
while(waveOut.PlaybackState == PlaybackState.Playing) {
Thread.Sleep(1000);
seconds++;
if (secs % 30 > 15) bufstr.readToStream();
}
bufstr's readToStream method
public void readToStream() {
int prevbufcount = totalbuffered; //I keep track of how many bytes I fetched from remote url.
while (stream.CanRead && prevbufcount + (30 * (this.bitrate / 8)) > totalbuffered && totalbuffered != contentlength) { //read around 30 seconds of content;
Console.Write($"Caching {prevbufcount + (30 * (this.bitrate / 8))}/{totalbuffered}\r");
byte[] buf = new byte[1024];
bufferedcount = stream.Read(buf, 0, 1024);
totalbuffered += bufferedcount;
memorystream.Write(buf, 0, bufferedcount);
}
}
While debugging I found out that content length I get from server does not match with the actual size of stream, so I ended up calculating content length via other details I get from server.
The issue might also be a race condition due to the fact that it stopped after I manually kept track of where I write on memory stream

Audio Encoding conversion problems with PCM 32-bit yo PCM 16-bit

I am using C# in Universal Windows App to write a Watson Speech-to-text service.
For now instead of using the Watson service, I write to the file and then read it in the Audacity to confirm it is in the right format since Watson service wasn't returning correct responses to me, and the following explains why.
For some reason when I create 16-bit PCM encoding properties, and read buffer, I am only able to read data as 32-bit PCM, and it's working well, but if I read it in 16-bit PCM it is in slow motion, and all the speech is basically corrupted.
I don't really know what exactly needs to be done to convert from 32-bit to 16-bit, but here's what I have in my C# application:
//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
new AudioGraphSettings(AudioRenderCategory.Speech)
{
DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
AudioRenderCategory = AudioRenderCategory.Speech,
EncodingProperties = pcmEncoding
}
);
graph = result.Graph;
//Initialize microphone
var microphone = await DeviceInformation.CreateFromIdAsync(MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default));
var micInputResult = await graph.CreateDeviceInputNodeAsync(MediaCategory.Speech, pcmEncoding, microphone);
//Create frame output node
frameOutputNode = graph.CreateFrameOutputNode(pcmEncoding);
//Callback function to fire when buffer is filled with data
graph.QuantumProcessed += (s, a) => ProcessFrameOutput(frameOutputNode.GetFrame());
frameOutputNode.Start();
//Make the microphone write into the frame node
micInputResult.DeviceInputNode.AddOutgoingConnection(frameOutputNode);
micInputResult.DeviceInputNode.Start();
graph.Start();
Initialization step is done at this stage. Now, actually reading from the buffer and writing to the file is only working if I use 32-bit PCM encoding with the following function (commented out is the PCM 16-bit code that is resulting in a slow motion speech output):
private void ProcessFrameOutput(AudioFrame frame)
{
//Making a copy of the audio frame buffer
var audioBuffer = frame.LockBuffer(AudioBufferAccessMode.Read);
var buffer = Windows.Storage.Streams.Buffer.CreateCopyFromMemoryBuffer(audioBuffer);
buffer.Length = audioBuffer.Length;
using (var dataReader = DataReader.FromBuffer(buffer))
{
dataReader.ByteOrder = ByteOrder.LittleEndian;
byte[] byteData = new byte[buffer.Length];
int pos = 0;
while (dataReader.UnconsumedBufferLength > 0)
{
/*Reading Float -> Int 32*/
/*With this code I can import raw wav file into the Audacity
using Signed 32-bit PCM Encoding, and it is working well*/
var singleTmp = dataReader.ReadSingle();
var int32Tmp = (Int32)(singleTmp * Int32.MaxValue);
byte[] chunkBytes = BitConverter.GetBytes(int32Tmp);
byteData[pos++] = chunkBytes[0];
byteData[pos++] = chunkBytes[1];
byteData[pos++] = chunkBytes[2];
byteData[pos++] = chunkBytes[3];
/*Reading Float -> Int 16 (Slow Motion)*/
/*With this code I can import raw wav file into the Audacity
using Signed 16-bit PCM Encoding, but when I play it, it's in
a slow motion*/
//var singleTmp = dataReader.ReadSingle();
//var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
//byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
//byteData[pos++] = chunkBytes[0];
//byteData[pos++] = chunkBytes[1];
}
WriteBytesToFile(byteData);
}
}
Can anyone think of a reason why this is happening? Is it because Int32 PCM is larger in size and when I use Int16, it extends it and makes the sound longer? Or am I not sampling it properly?
Note: I tried reading Bytes directly from the buffer, and then using that as a raw data, but it's not encoded as PCM that way.
Reading Int16/32 from the buffer directly also doesn't work.
In the above example I am only using Frame Output node. IF I create a file output node that automatically writes to the raw file, it works really well as 16-bit PCM, so something is wrong in my callback function that causes it to be in a slow motion.
Thanks
//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
new AudioGraphSettings(AudioRenderCategory.Speech)
{
DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
AudioRenderCategory = AudioRenderCategory.Speech,
EncodingProperties = pcmEncoding
}
);
graph = result.Graph;
pcmEncoding does not make much sense here since only Float encoding is supported by AudioGraph.
byte[] byteData = new byte[buffer.Length];
it should be buffer.Length / 2 since you convert from float data with 4 bytes per sample to int16 data with 2 bytes per sample
/*Reading Float -> Int 16 (Slow Motion)*/
/*With this code I can import raw wav file into the Audacity
using Signed 16-bit PCM Encoding, but when I play it, it's in
a slow motion*/
var singleTmp = dataReader.ReadSingle();
var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
byteData[pos++] = chunkBytes[0];
byteData[pos++] = chunkBytes[1];
This is correct code, it should work. Your "slow motion" is most likely related to the buffer size you incorrectly set before.
I must admit Microsoft needs someone to review their bloated APIs

Resampling raw audio with NAudio

I'd like to resample audio: change its sample rate from 44k to 11k. The input I've got is raw audio in bytes. It really is raw, it has no headers - if I try loading it into a WaveFileReader, I get an exception saying "Not a WAVE file - no RIFF header".
How I'm currently trying to achieve it is something like this (just a really simplified piece of code):
WaveFormat ResampleInputFormat = new WaveFormat(44100, 1);
WaveFormat ResampleOutputFormat = new WaveFormat(11025, 1);
MemoryStream ResampleInputMemoryStream = new MemoryStream();
foreach (var b in InputListOfBytes)
{
ResampleInputMemoryStream.Write(new byte[]{b}, 0, 1);
}
RawSourceWaveStream ResampleInputWaveStream =
new RawSourceWaveStream(ResampleInputMemoryStream, ResampleInputFormat);
WaveFormatConversionStream ResampleOutputStream =
new WaveFormatConversionStream(ResampleOutputFormat, ResampleInputWaveStream);
byte[] bytes = new byte[2];
while (ResampleOutputStream.Read(bytes, 0, 2) > 0)
{
OutputListOfBytes.Add(bytes[0]);
OutputListOfBytes.Add(bytes[1]);
}
My problem is: the last loop is an infinite loop. The Read() always gets the same values and never advances in the Stream. I even tried Seek()-ing to the right position after each Read(), but that doesn't seem to work either, I still always get the same values.
What am I doing wrong? And is this even the right way to resample raw audio? Thanks in advance!
First, you need to reset ResampleInputMemoryStream's position to the start. It may actually have been easier to create the memory stream based on the array:
new MemoryStream(InputListOfBytes)
Second, when reading out of the resampler, you need to read in larger chunks than two bytes at a time. Try at least a second's worth of audio (use ResampleOutputStream.WaveFormat.AverageBytesPerSecond).

How to perform the FFT to a wave-file using NAudio

I'm working with the NAudio-library and would like to perform the fast fourier transformation to a WaveStream. I saw that NAudio has already built-in the FFT but how do I use it?
I heard i have to use the SampleAggregator class.
You need to read this entire blog article to best understand the following code sample I lifted to ensure the sample is preserved even if the article isn't:
using (WaveFileReader reader = new WaveFileReader(fileToProcess))
{
IWaveProvider stream32 = new Wave16toFloatProvider(reader);
IWaveProvider streamEffect = new AutoTuneWaveProvider(stream32, autotuneSettings);
IWaveProvider stream16 = new WaveFloatTo16Provider(streamEffect);
using (WaveFileWriter converted = new WaveFileWriter(tempFile, stream16.WaveFormat))
{
// buffer length needs to be a power of 2 for FFT to work nicely
// however, make the buffer too long and pitches aren't detected fast enough
// successful buffer sizes: 8192, 4096, 2048, 1024
// (some pitch detection algorithms need at least 2048)
byte[] buffer = new byte[8192];
int bytesRead;
do
{
bytesRead = stream16.Read(buffer, 0, buffer.Length);
converted.WriteData(buffer, 0, bytesRead);
} while (bytesRead != 0 && converted.Length < reader.Length);
}
}
but in short, if you get the WAV file created you can use that sample to convert it to FFT.

Display a audio waveform using C#

I've already searched at Stackoverflow and google, but haven't found what I'm looking for.
So far I got the audio raw data(WAV File) and I want to visualize it.
private void Form1_Load(object sender, EventArgs e)
{
FileStream fs = new FileStream("D:\\tada.wav", FileMode.Open);
BinaryReader reader = new BinaryReader(fs);
char[] data = new char[4];
long fsize;
long wfxSize;
long dataSize;
WaveFormatEx wfx;
//RIFF
reader.Read(data, 0, 4);
fsize = reader.ReadInt32();
//WAVE
reader.Read(data, 0, 4);
//FMT
reader.Read(data, 0, 4);
wfxSize = reader.ReadInt32();
byte[] wfxBuffer = new byte[wfxSize];
reader.Read(wfxBuffer, 0, (int)wfxSize);
wfx = new WaveFormatEx(wfxBuffer);
//DATA
reader.Read(data, 0, 4);
dataSize = reader.ReadInt32();
byte[] dataBuff = new byte[dataSize];
reader.Read(dataBuff, 0, (int)dataSize);
reader.Close();
//Visualize the data...
}
I know I need to convert the raw data into samples and then check for the peak for each sample and draw lines, but I really don't know how to do it(except for the drawing).
I see this is an old question but in case someone is interested here is a solution:
Use the NAudio library:
http://naudio.codeplex.com/
Here is a video tutorial on how to use NAudio to display waveforms:
http://www.youtube.com/watch?v=ZnFoVuOVrUQ
Visualize the data... Wow! You should check out the WAV file spec here and perhaps here and then re-think whether this is something you actually want to tackle. (The second link is actually a better, more streamlined overview. Take a look at the data section to see if it's something you want to work with.)
Don't get me wrong. Maybe this is exactly what you want to do, and it might be fun. You should just know what you're getting into!
Also, here's a Code Project component that you could use outright or look at for ideas.

Categories