UWP AudioGraph API - FrameOutputNode read bytes wrong - c#

I have a problem with the FrameOutputNode of the UWP Audio Graph API. I have a very simple graph that reads audio from a wav (PCM 16000Hz, 16 bit mono) file and sends it to the frame output node for processing. When processing, I need the audio to be in shorts (as they are in the raw bytes of the file). But as I read here the data can only be read as floats.
Here is my code:
var encoding = MediaEncodingProfile.CreateWav(AudioEncodingQuality.Low);
encoding.Audio = AudioEncodingProperties.CreatePcm(16000, 1, 16);
AudioGraphSettings settings = new AudioGraphSettings(AudioRenderCategory.Media);
settings.EncodingProperties = encoding.Audio;
CreateAudioGraphResult result = await AudioGraph.CreateAsync(settings);
var graph = result.Graph;
var localFolder = Windows.Storage.ApplicationData.Current.LocalFolder;
StorageFile file = await localFolder.GetFileAsync("audio.wav");
var fileInputNodeResult = await graph.CreateFileInputNodeAsync(file);
var fileInputNode = fileInputNodeResult.FileInputNode;
fileInputNode.FileCompleted += async (AudioFileInputNode sender, object args) =>
{
graph.Stop();
}
frameOutputNode = graph.CreateFrameOutputNode(encoding.Audio);
fileInputNode.AddOutgoingConnection(frameOutputNode);
graph.QuantumStarted+= AudioGraph_QuantumStarted;
With the following AudioGraph_QuantumStarted event handler:
private void AudioGraph_QuantumStarted(AudioGraph sender, object args)
{
AudioFrame frame = frameOutputNode.GetFrame();
ProcessFrameOutput(frame);
}
unsafe private void ProcessFrameOutput(AudioFrame frame)
{
AudioBuffer buffer = frame.LockBuffer(AudioBufferAccessMode.Read);
IMemoryBufferReference reference = buffer.CreateReference();
((IMemoryBufferByteAccess)reference).GetBuffer(out byte* dataInBytes, out uint capacityInBytes);
if (capacityInBytes > 0) {
// Read the first 20 bytes
for (int i = 0; i < 20; i++)
{
Debug.WriteLine(dataInBytes[i]);
}
}
}
The bytes I receive in the output are the following. Since the samples are returned as bytes of a float, I marked the sample boundary with a line.
0 0 0 0 | 0 0 0 184 | 0 0 128 184 | 0 0 0 184 ...
But when I read the actual bytes from the file with a byte reader:
FileStream fIn = new FileStream(#"/path/to/audio.wav", FileMode.Open);
BinaryReader br = new BinaryReader(fIn);
// Skip the first 44 bytes since they are header stuff
br.ReadBytes(44);
for (int i = 0; i < 20; i++)
{
Debug.WriteLine(br.ReadByte());
}
Then I get the actual bytes:
0 0 | 255 255 | 254 255 | 255 255 | 255 255 | 254 255 | 253 255 | 252 255 ...
Again I marked the individual samples (shorts -> two bytes) with a line.
As you can see the short bytes 255 255 somehow map to float bytes 0 0 0 184 as they reoccur. So what is that mapping? How can I retrieve the raw shorts from the floats? What do I need to do to actually read the wav file bytes?

My question was answered here. Basically, the floats are the range of the shorts -32768 to 32767 converted to range -1 to 1 in float.
So given a float x in the buffer (use (float*)dataInFloats = (float*)dataInBytes to convert) you can calculate the corresponding short with:
f(x) = (65535 * x - 1) / 2

Related

How do I read an audio file into an array in C#

I am trying to read a WAV file into a buffer array in c# but am having some problems. I am using a file stream to manage the audio file. Here is what I have...
FileStream WAVFile = new FileStream(#"test.wav", FileMode.Open);
//Buffer for the wave file...
BinaryReader WAVreader = new BinaryReader(WAVFile);
//Read information from the header.
chunkID = WAVreader.ReadInt32();
chunkSize = WAVreader.ReadInt32();
RiffFormat = WAVreader.ReadInt32();
...
channels = WAVreader.ReadInt16();
samplerate = WAVreader.ReadInt32();
byteRate = WAVreader.ReadInt32();
blockAllign = WAVreader.ReadInt16();
bitsPerSample = WAVreader.ReadInt16();
dataID = WAVreader.ReadInt32();
dataSize = WAVreader.ReadInt32();
The above is reading data from the WAV file header. Then I have this:
musicalData = WAVreader.ReadBytes(dataSize);
...to read the actual sample data but this is only 26 bytes for 60 seconds of audio. Is this correct?
How would I go about converting the byte[] array to double[]?
This code should do the trick. It converts a wave file to a normalized double array (-1 to 1), but it should be trivial to make it an int/short array instead (remove the /32768.0 bit and add 32768 instead). The right[] array will be set to null if the loaded wav file is found to be mono.
I can't claim it's completely bullet proof (potential off-by-one errors), but after creating a 65536 sample array, and creating a wave from -1 to 1, none of the samples appear to go 'through' the ceiling or floor.
// convert two bytes to one double in the range -1 to 1
static double bytesToDouble(byte firstByte, byte secondByte)
{
// convert two bytes to one short (little endian)
short s = (secondByte << 8) | firstByte;
// convert to range from -1 to (just below) 1
return s / 32768.0;
}
// Returns left and right double arrays. 'right' will be null if sound is mono.
public void openWav(string filename, out double[] left, out double[] right)
{
byte[] wav = File.ReadAllBytes(filename);
// Determine if mono or stereo
int channels = wav[22]; // Forget byte 23 as 99.999% of WAVs are 1 or 2 channels
// Get past all the other sub chunks to get to the data subchunk:
int pos = 12; // First Subchunk ID from 12 to 16
// Keep iterating until we find the data chunk (i.e. 64 61 74 61 ...... (i.e. 100 97 116 97 in decimal))
while(!(wav[pos]==100 && wav[pos+1]==97 && wav[pos+2]==116 && wav[pos+3]==97))
{
pos += 4;
int chunkSize = wav[pos] + wav[pos + 1] * 256 + wav[pos + 2] * 65536 + wav[pos + 3] * 16777216;
pos += 4 + chunkSize;
}
pos += 8;
// Pos is now positioned to start of actual sound data.
int samples = (wav.Length - pos)/2; // 2 bytes per sample (16 bit sound mono)
if (channels == 2)
{
samples /= 2; // 4 bytes per sample (16 bit stereo)
}
// Allocate memory (right will be null if only mono sound)
left = new double[samples];
if (channels == 2)
{
right = new double[samples];
}
else
{
right = null;
}
// Write to double array/s:
int i=0;
while (pos < length)
{
left[i] = bytesToDouble(wav[pos], wav[pos + 1]);
pos += 2;
if (channels == 2)
{
right[i] = bytesToDouble(wav[pos], wav[pos + 1]);
pos += 2;
}
i++;
}
}
If you wanted to use plugins, then assuming your WAV file contains 16 bit PCM (which is the most common), you can use NAudio to read it out into a byte array, and then copy that into an array of 16 bit integers for convenience. If it is stereo, the samples will be interleaved left, right.
using (WaveFileReader reader = new WaveFileReader("myfile.wav"))
{
Assert.AreEqual(16, reader.WaveFormat.BitsPerSample, "Only works with 16 bit audio");
byte[] buffer = new byte[reader.Length];
int read = reader.Read(buffer, 0, buffer.Length);
short[] sampleBuffer = new short[read / 2];
Buffer.BlockCopy(buffer, 0, sampleBuffer, 0, read);
}
I personally try to avoid using third-party libraries as much as I can. But the option is still there if you'd like the code to look better and easier to handle.
It's been a good 10-15 years since I touched WAVE file processing, but unlike the first impression that most people get about wave files as simple fixed-size header followed by PCM encoded audio data, WAVE files are a bit more complex RIFF format files.
Instead of re-engineering RIFF file processing and various cases, I would suggest to use interop and call on APIs that deal with RIFF file format.
You can see example of how to open and get data buffer (and meta information about what buffer is) in this example. It's in C++, but it shows use of mmioOpen, mmioRead, mmioDescend, mmioAscend APIs that you would need to use to get your hands on a proper audio buffer.

Convert audio stream to frequency

I've managed to successfully get a stream of audio data going to an output device (speaker) using NAudio:
private void OnDataAvailable(object sender, WaveInEventArgs e)
{
var buffer = e.Buffer;
var bytesRecorded = e.BytesRecorded;
Debug.WriteLine($"Bytes {bytesRecorded}");
And the sample output:
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 23040
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
Bytes 19200
I then transform (FFT) this to x and y values using https://stackoverflow.com/a/20414331:
var buffer = e.Buffer;
var bytesRecorded = e.BytesRecorded;
//Debug.WriteLine($"Bytes {bytesRecorded}");
var bufferIncrement = _waveIn.WaveFormat.BlockAlign;
for (var index = 0; index < bytesRecorded; index += bufferIncrement)
{
var sample32 = BitConverter.ToSingle(buffer, index);
_sampleAggregator.Add(sample32);
}
With a sample output of:
x: -9.79634E-05, y: -9.212703E-05
x: 6.897306E-05, y: 2.489315E-05
x: 0.0002080683, y: 0.0004317867
x: -0.0001720883, y: -6.681971E-05
x: -0.0001245111, y: 0.0002880402
x: -0.0005751926, y: -0.0002682915
x: -5.280507E-06, y: 7.297558E-05
x: -0.0001143928, y: -0.0001156801
x: 0.0005231025, y: -0.000153206
x: 0.0001011164, y: 7.681748E-05
x: 0.000330695, y: 0.0002293986
Not sure if this is even possible or if I'm just misunderstanding what the stream is returning, but I'd like to get the frequency of the audio stream in order to do some stuff with Philips Hue. The x, y values above are way to small to use in the CIE colour space. Am I doing something wrong or am I completely misunderstanding what the data is in the buffer in OnDataAvailable?
Thanks!
Edit:
I've modified my OnDataAvailable code based on comments and the tutorial for the Autotune program to be the below:
private void OnDataAvailable(object sender, WaveInEventArgs e)
{
var buffer = e.Buffer;
float sample32 = 0;
for (var index = buffer.Length > 1024 ? buffer.Length - 1024 : buffer.Length; index < e.BytesRecorded; index += 2)
{
var sample = (short) ((buffer[index + 1] << 8) | buffer[index + 0]);
sample32 = sample / 32768f;
Debug.WriteLine(sample32);
LightsController.SetLights(Convert.ToByte(Math.Abs(sample32) * 255));
_sampleAggregator.Add(sample32);
}
var floats = BytesToFloats(buffer);
if (sample32 != 0.0f)
{
var pitchDetect = new FftPitchDetector(sample32);
var pitch = pitchDetect.DetectPitch(floats, floats.Length);
Debug.WriteLine($"Pitch {pitch}");
}
}
The hope is that I only use the last set of elements from the buffer as it doesn't seem to clear itself and I'm only interested in the latest set of data available in order to get the frequency of the current audio. However, i still get an index exception ocassionally when the DetectPitch method is called. Where am I going wrong? I was hoping to use the frequency to change the colour and brightness of hue bulbs.
Use
fPeak = SamplingRate * BinNumberOfPeak / FFTLength ;

Reading 24-bit samples from a .WAV file

I understand how to read 8-bit, 16-bit & 32-bit samples (PCM & floating-point) from a .wav file, since (conveniently) the .Net Framework has an in-built integral type for those exact sizes. But, I don't know how to read (and store) 24-bit (3 byte) samples.
How can I read 24-bit audio? Is there maybe some way I can alter my current method (below) for reading 32-bit audio to solve my problem?
private List<float> Read32BitSamples(FileStream stream, int sampleStartIndex, int sampleEndIndex)
{
var samples = new List<float>();
var bytes = ReadChannelBytes(stream, Channels.Left, sampleStartIndex, sampleEndIndex); // Reads bytes of a single channel.
if (audioFormat == WavFormat.PCM) // audioFormat determines whether to process sample bytes as PCM or floating point.
{
for (var i = 0; i < bytes.Length / 4; i++)
{
samples.Add(BitConverter.ToInt32(bytes, i * 4) / 2147483648f);
}
}
else
{
for (var i = 0; i < bytes.Length / 4; i++)
{
samples.Add(BitConverter.ToSingle(bytes, i * 4));
}
}
return samples;
}
Reading (and storing) 24-bit samples is very simple. Now, as you've rightly said, a 3 byte integral type does not exist within the framework, which means you're left with two choices; either create your own type, or, you can pad your 24-bit samples by inserting an empty byte (0) to the start of your sample's byte array therefore making them 32-bit samples (so you can then use an int to store/manipulate them).
I will explain and demonstrate how to do the later (which is also in my opinion the more simpler approach).
First we must look at how a 24-bit sample would be stored within an int,
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ MSB ~ ~ 2ndMSB ~ ~ 2ndLSB ~ ~ LSB ~ ~
24-bit sample: 11001101 01101001 01011100 00000000
32-bit sample: 11001101 01101001 01011100 00101001
MSB = Most Significant Byte, LSB = Lest Significant Byte.
As you can see the LSB of the 24-bit sample is 0, therefore all you have to is declare a byte[] with 4 elements, then read the 3 bytes of the sample into the array (starting at element 1) so that your array looks like below (effectively bit shifting by 8 places to the left),
myArray[0]: 00000000
myArray[1]: 01011100
myArray[2]: 01101001
myArray[3]: 11001101
Once you have your byte array full you can pass it to BitConverter.ToInt32(myArray, 0);, you will then need to shift the sample by 8 places to the right to get the sample in it's proper 24-bit intergal representation (from -8388608 to 8388608); then divide by 8388608 to have it as a floating-point value.
So, putting that all together you should end up with something like this,
Note, I wrote the following code with the intention to be "easy-to-follow", therefore this will not be the most performant method, for a faster solution see the code below this one.
private List<float> Read24BitSamples(FileStream stream, int startIndex, int endIndex)
{
var samples = new List<float>();
var bytes = ReadChannelBytes(stream, Channels.Left, startIndex, endIndex);
var temp = new List<byte>();
var paddedBytes = new byte[bytes.Length / 3 * 4];
// Right align our samples to 32-bit (effectively bit shifting 8 places to the left).
for (var i = 0; i < bytes.Length; i += 3)
{
temp.Add(0); // LSB
temp.Add(bytes[i]); // 2nd LSB
temp.Add(bytes[i + 1]); // 2nd MSB
temp.Add(bytes[i + 2]); // MSB
}
// BitConverter requires collection to be an array.
paddedBytes = temp.ToArray();
temp = null;
bytes = null;
for (var i = 0; i < paddedBytes.Length / 4; i++)
{
samples.Add(BitConverter.ToInt32(paddedBytes, i * 4) / 2147483648f); // Skip the bit shift and just divide, since our sample has been "shited" 8 places to the right we need to divide by 2147483648, not 8388608.
}
return samples;
}
For a faster1 implementation you can do the following instead,
private List<float> Read24BitSamples(FileStream stream, int startIndex, int endIndex)
{
var bytes = ReadChannelBytes(stream, Channels.Left, startIndex, endIndex);
var samples = new float[bytes.Length / 3];
for (var i = 0; i < bytes.Length; i += 3)
{
samples[i / 3] = (bytes[i] << 8 | bytes[i + 1] << 16 | bytes[i + 2] << 24) / 2147483648f;
}
return samples.ToList();
}
1 After benchmarking the above code against the previous method, this solution is approximately 450% to 550% faster.

Convert a string to a bitmap in c#

I want to convert a string into a bitmap or something I can show in a pixelbox.
My string looks like this:
string rxstring = "010010010020020020030030030040040040050050050060060060070070070080080080090090090100100100110110110120120120130130130140140140150150150160160160“
It is no problem to erase the RGB code in the string
("01002003004005060070080090100110120130140150160");
I only need it to show, the is not important [sic]
IDE: VS2010 C#
I'm afraid the data you are getting is not a meaningful image. If you split the data into groups of three. You get the following:
010
010
010
020
020
020
030
030
030
040
040
040
050
050
050
060
060
060
070
070
070
080
080
080
090
090
090
100
100
100
110
110
110
120
120
120
130
130
130
140
140
140
150
150
150
160
160
160
If you look at that data there's no way you can convert this to an image that would actually mean something to us. It would be a collection of 48 pixels. Containing a sort of gradient like image (since the numbers below follow a pattern that is constantly increasing.
We would need more information to debug this. (Like what component is providing the data etc.)
Update
This is what I get when I convert your data to pixels (take in account i've enlarged every pixel to 16x16)
Upon continuing review, I realized that the string your getting isn't a byte array. This creates a square Bitmap and lets you set the values pixel by pixel.
List<string> splitBytes = new List<string>();
string byteString = "";
foreach (var chr in rsstring)
{
byteString += chr;
if (byteString.Length == 3)
{
splitBytes.Add(byteString);
byteString = "";
}
}
var pixelCount = splitBytes.Count / 3;
var numRows = pixelCount / 4;
var numCols = pixelCount / 4;
System.Drawing.Bitmap map = new System.Drawing.Bitmap(numRows, numCols);
var curPixel = 0;
for (int y = 0; y < numCols; y++)
{
for (int x = 0; x < numRows; x++ )
{
map.SetPixel(x, y, System.Drawing.Color.FromArgb(
Convert.ToInt32(splitBytes[curPixel * 3]),
Convert.ToInt32(splitBytes[curPixel * 3 + 1]),
Convert.ToInt32(splitBytes[curPixel * 3 + 2])));
curPixel++;
}
}
//Do something with image
EDIT: Made corrections to the row/col iterations to match the image shown above.
Try converting the string to a byte array and loading it into a memory stream. Once in the stream, you should be able to convert to an image.
List<byte> splitBytes = new List<byte>();
string byteString = "";
foreach (var chr in testString)
{
byteString += chr;
if (byteString.Length == 3)
{
splitBytes.Add(Convert.ToByte(byteString));
byteString = "";
}
}
if (byteString != "")
splitBytes.AddRange(Encoding.ASCII.GetBytes(byteString));
using (var ms = new MemoryStream(splitBytes.ToArray()))
{
var img = System.Drawing.Image.FromStream(ms);
//do something with image.
}
EDIT: Added updated code. This was tested by loading an image of my own and converting the bytes into a string, then converting them back into a byte array using the above code and I successfully loaded the image from a string.
string testString = "255216255224000016074070073070000001001001000096000096000000255225000104069120105102000000077077000042000000000008000004001026000005000000000001000000000062001027000005000000000001000000000070001040000003000000000001000002000000001049000002000000000018000000000078000000000000000000000096000000000001000000000096000000000001080097105110116046078069084032118051046053046049049000255219000067000002001001002001001002002002002002002002002003005003003003003003006004004003005007006007007007006007007008009011009008008010008007007010013010010011012012012012007009014015013012014011012012012255219000067001002002002003003003006003003006012008007008012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012012255192000017008000004000004003001034000002017001003017001255196000031000000001005001001001001001001000000000000000000000000001002003004005006007008009010011255196000181016000002001003003002004003005005004004000000001125001002003000004017005018033049065006019081097007034113020050129145161008035066177193021082209240036051098114130009010022023024025026037038039040041042052053054055056057058067068069070071072073074083084085086087088089090099100101102103104105106115116117118119120121122131132133134135136137138146147148149150151152153154162163164165166167168169170178179180181182183184185186194195196197198199200201202210211212213214215216217218225226227228229230231232233234241242243244245246247248249250255196000031001000003001001001001001001001001001000000000000000000001002003004005006007008009010011255196000181017000002001002004004003004007005004004000001002119000001002003017004005033049006018065081007097113019034050129008020066145161177193009035051082240021098114209010022036052225037241023024025026038039040041042053054055056057058067068069070071072073074083084085086087088089090099100101102103104105106115116117118119120121122130131132133134135136137138146147148149150151152153154162163164165166167168169170178179180181182183184185186194195196197198199200201202210211212213214215216217218226227228229230231232233234242243244245246247248249250255218000012003001000002017003017000063000252225248089251085248195193031007060033030133127054137107121166121143107103121116176043069052182202085076167111238224143056234193152252204073040162128063255217";
EDIT: Added a sample string of the image I used to test the above code.

Getting PCM values of WAV files

I have a .wav mono file (16bit,44.1kHz) and im using this code below. If im not wrong, this would give me an output of values between -1 and 1 which i can apply FFT on ( to be converted to a spectrogram later on). However, my output is no where near -1 and 1.
This is a portion of my output
7.01214599609375
17750.2552337646
8308.42733764648
0.000274658203125
1.00001525878906
0.67291259765625
1.3458251953125
16.0000305175781
24932
758.380676269531
0.0001068115234375
This is the code which i got from another post
Edit 1:
public static Double[] prepare(String wavePath, out int SampleRate)
{
Double[] data;
byte[] wave;
byte[] sR = new byte[4];
System.IO.FileStream WaveFile = System.IO.File.OpenRead(wavePath);
wave = new byte[WaveFile.Length];
data = new Double[(wave.Length - 44) / 4];//shifting the headers out of the PCM data;
WaveFile.Read(wave, 0, Convert.ToInt32(WaveFile.Length));//read the wave file into the wave variable
/***********Converting and PCM accounting***************/
for (int i = 0; i < data.Length; i += 2)
{
data[i] = BitConverter.ToInt16(wave, i) / 32768.0;
}
/**************assigning sample rate**********************/
for (int i = 24; i < 28; i++)
{
sR[i - 24] = wave[i];
}
SampleRate = BitConverter.ToInt16(sR, 0);
return data;
}
Edit 2 : Im getting ouput with 0s every 2nd number
0.009002685546875
0
0.009613037109375
0
0.0101318359375
0
0.01080322265625
0
0.01190185546875
0
0.01312255859375
0
0.014068603515625
If your samples are 16 bits (which appears to be the case), then you want to work with Int16. Each 2 bytes of the sample data is a signed 16-bit integer in the range -32768 .. 32767, inclusive.
If you want to convert a signed Int16 to a floating point value from -1 to 1, then you have to divide by Int16.MaxValue + 1 (which is equal to 32768). So, your code becomes:
for (int i = 0; i < data.Length; i += 2)
{
data[i] = BitConverter.ToInt16(wave, i) / 32768.0;
}
We use 32768 here because the values are signed.
So -32768/32768 will give -1.0, and 32767/32768 gives 0.999969482421875.
If you used 65536.0, then your values would only be in the range -0.5 .. 0.5.

Categories