What is actually contained in data chunk in wav file? - c#

For example take the case of a stereo channel wav file with sample rate as 44100 and a bit depth of 16 bits.
Exactly how is the 16 bits divided up?
The audio clip that I was using, the first 4 bytes had data about the first audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).
The next 4 bytes had data about the second audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).
So I would like to figure out what those 4 bits are.

A WAV File contains several chunks.
The FMT chunk specifies the format of the audio data.
The actual audio data are within the data chunk.
It depends on the actual format. But let's assume the following format as example:
PCM, 16 bit, 2 channels with a samplerate of 44100Hz.
Audio data is represented as samples. In this case each sample takes 16 bits = 2 Bytes.
If we got multiple channels (in this examples 2 = Stereo), it will look like this:
left sample, right sample, left sample, right sample, ...
since each sample takes 2 Bytes (16 bits) we got something like this:
Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 | ...
left sample | right sample | left sample | right sample | ...
Each second of audio contains 44100 samples for EACH channel.
So in total, one second of audio takes 44100 * ( 16 / 8 ) * 2 Bytes.

WAV format audio file starts with a 44 byte header followed by the payload which is the uncompressed raw PCM audio data ... in the payload area as you walk across the PCM data each sample (point on audio curve) will contain data for all channels ... header will tell you number of channels ... for stereo using bit depth of 16 you will see two bytes (16 bits == bit depth) for a given channel immediately followed by the two bytes of the next channel etc...
For a given channel a given set of bytes (2 bytes in your case) will appear in two possible layouts determined by choice of endianness ... 1st byte followed by 2nd byte ... ordering of endianness is important here ... header also tells you what endianness you are using ... typically WAV format is little endian
each channel will generate its own audio curve
in your code to convert from PCM data into a usable audio curve data point you must combine all bytes of a given sample for given channel into a single value ... typically its integer and not floating point again the header defines which ... if integer it could be signed or unsigned ... little endian means as you read the file the first (left most) byte will become the least significant byte followed by each subsequent byte which becomes the next most significant byte
in pseudo code :
int mydatapoint // allocate your integer audio curve data point
step 0
mydatapoint = most-significant-byte
stop here for bit depth of 8
... if you have bit depth greater than 8 bits now left shift this to make room for the following byte if any
step 1
mydatapoint = mydatapoint << 8 // shove data to the left by 8 bits
// which effectively jacks up its value
// and leaves empty those right most 8 bits
step 2
// following operation is a bit wise OR operation
mydatapoint = mydatapoint OR next-most-significant-byte
now repeat doing steps 1 & 2 for each subsequent next byte of PCM data in order from most significant to least significant (for little endian) ... essential for any bit depth beyond 16 so for 24 bit audio or 32 bit you will need to combine 3 or 4 bytes of PCM data into your single integer output audio curve data point
Why are we doing this bit shifting nonsense
The level of audio fidelity when converting from analog to digital is driven by how accurately are you recording the audio curve ... analog audio is a continuous curve however to become digital it must be sampled into discrete points along the curve ... two factors determine the fidelity when sampling the analog curve to create its digital representation ... the left to right distance along the analog audio curve is determined by sample rate and the up and down distance along the audio curve is determined by bit depth ... higher sample rate gives you more samples per second and a greater bit depth gives you more vertical points to approximate the instantaneous height of the analog audio curve
bit depth 8 == 2^8 == 256 distinct vertical values to record curve height
bit depth 16 == 2^16 == 65536 distinct vertical values to record curve height
so to more accurately record into digital the height of our analog audio curve we want to become as granular as possible ... so the resultant audio curve is as smooth as possible and not jagged which would happen if we only allocated 2 bits which would give us 2^2 which is 4 distinct values ... try to connect the dots when your audio curve only has 4 vertical values to choose from on your plot ... the bit shifting is simply building up a single integer value from many bytes of data ... numbers greater than 256 cannot fit into one byte and so must be spread across multiple bytes of PCM data
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

Related

How to change wav bits per sample from 16 to 32 in .NET?

I want to change source wav file bits per sample from 16 to 32
To do this I update:
raw data: convert each 2 bytes to short, then cast to float then to bytes array (instead of initial two bytes)
data length field by doubling
bits per sample by doubling
byte rate field just by doubling
block align parameter by doubling
After saving wav file a lot of noise appears and sound becomes very loud.
What am I doing wrong?

Reading and writing non-standard floating point values

I'm working with a binary file (3d model file for an old video game) in C#. The file format isn't officially documented, but some of it has been reverse-engineered by the game's community.
I'm having trouble understanding how to read/write the 4-byte floating point values. I was provided this explanation by a member of the community:
For example, the bytes EE 62 ED FF represent the value -18.614.
The bytes are little endian ordered. The first 2 bytes represent the decimal part of the value, and the last 2 bytes represent the whole part of the value.
For the decimal part, 62 EE converted to decimal is 25326. This represents the fraction out of 1, which also can be described as 65536/65536. Thus, divide 25326 by 65536 and you'll get 0.386.
For the whole part, FF ED converted to decimal is 65517. 65517 represents the whole number -19 (which is 65517 - 65536).
This makes the value -19 + .386 = -18.614.
This explanation mostly makes sense, but I'm confused by 2 things:
Does the magic number 65536 have any significance?
BinaryWriter.Write(-18.613f) writes the bytes as 79 E9 94 C1, so my assumption is the binary file I'm working with uses its own proprietary method of storing 4-byte floating point values (i.e. I can't use C#'s float interchangably and will need to encode/decode the values first)?
Firstly, this isn't a Floating Point Number its a Fix Point Number
Note : A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point)
Does the magic number 65536 have any significance
Its the max number of values unsigned 16 bit number can hold, or 2^16, yeah its significant, because the number you are working with is 2 * 16 bit values encoded for integral and fractional components.
so my assumption is the binary file I'm working with uses its own
proprietary method of storing 4-byte floating point values
Nope wrong again, floating point values in .Net adhere to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) technical standard
When you use BinaryWriter.Write(float); it basically just shifts the bits into bytes and writes it to the Stream.
uint TmpValue = *(uint *)&value;
_buffer[0] = (byte) TmpValue;
_buffer[1] = (byte) (TmpValue >> 8);
_buffer[2] = (byte) (TmpValue >> 16);
_buffer[3] = (byte) (TmpValue >> 24);
OutStream.Write(_buffer, 0, 4);
If you want to read and write this special value you will need to do the same thing, you are going to have to read and write the bytes and convert them your self
This should be a build-in value unique to the game.
It should be more similar to Fraction Value.
Where 62 EE represent the Fraction Part of the value and FF ED represent the Whole Number Part of the value.
The While Number Part is easy to understand so I'm not going to explain it.
The explanation of Fraction Part is:
For every 2 bytes, there are 65536 possibilities (0 ~ 65535).
256 X 256 = 65536
hence the magic number 65536.
And the game itself must have a build-in algorithm to divide the first 2 bytes by 65536.
Choosing any number other than this will be a waste of memory space and result in decreased accuracy of the value which can be represented.
Of course, it's all depended on what kind of accuracy the game wish to present.

How can i store 12 bit values in an ushort?

I've got a stream coming from a camera that is set to a 12 bit pixel format.
My question is how can i store the pixel values in an array?
Before i was taking pictures with a 16 bit pixel format, but now i changed to 12 bit and I get the same full size image displayed four images on the screen next to one another I used to store the values in an ushort array then.
When i have the camera set to 8 bit pixel format I store the data in a byte array, but what should I use when having it at 12 bit?
Following on from my comment, we can process the incoming stream in 3-byte "chunks", each of which give 2 pixels.
// for a "chunk" of incoming array a[0], a[1], a[2]
ushort pixel1 = ((ushort)a[0] << 4) | ((a[1] >> 4) & 0xFF);
ushort pixel2 = ((ushort)(a[1] & 0xFF) << 4) | a[2];
(Assuming big-endian)
The smallest memory size you can allocate is one byte (8 bits) that means that if you need 12 bits of data to store one pixel in your frame array you should use ushort. And leave the 4 bits alone . That’s why it’s more efficient to design these kind of stuff with numbers from the pow of two
(1 2 4 8 16 32 64 128.. etch)

NAudio playback sample values much smaller than recorded values

When I record with nAudio using waveIn the buffer values that I get are large, my chart needs to be scaled to around 20 000 to display the samples correctly. When I replay this audio from the recorded wave file and capture samples from the waveOut using a SampleChannel sample provider the values are tiny.
The file format says the values it is giving me have a blockAlign of 8 with 32 bits per sample (float?) and 2 channels. Does this mean that 2x4 floats should be combined in some way to create each channels value?
I notice the floats in the supplied buffer arrays are discrete, they are multiples of 3.05175781E-05 in float format.
I'm at a bit of a loss as to what to do here. Do I need to process the floats that a waveout sampleProvider creates?
With waveIn, you're likely recording 16 bit samples, so they are short or Int16 values in the range -32768 to 32767
When you deal with floating point (float or Single) samples, they are normalised into the range -1.0 to 1.0.

Calculate Difference between 2 Time Spans DSP

This might be a wide answer but i would like to see answers and discuss this thread with SO users.
So far i guess a Audio File(WAV) has a Sample Rate which could be 44000 or 48000 (i've seen most these 2), and from that we can determine that a single Second into a File (second 00:00:01) has exactly 44000 Integer Values which means here we have an Int[], so if an Audio File Duration is 5 Seconds it has 5 * 44000 Integers (or 5 Samples).
So my question is, how can we calculate the difference (or similarity) of content between two time spans, like Audio1.wav and Audio2.wav at 00:00:01 with same Sample Rate.
There are couple assumptions in your reasoning:
1. The file is the raw uncompressed (PCM encoded) data.
2. There is only one channel (mono).
It's better to start from reading some format descriptions and sample implementations, then search for some audio comparison algorithms (1, 2, 3).
Linked Q: Compare two spectogram to find the offset where they match algorithm
One way to do this would be to resample the signal from 44100 Hz to 48000 Hz, so both signals have the same samplerate, and perform a cross-correlation. The shape of the cross-correlation could be a measure of similarity. You could look at the height of the peak, or the ratio of energy in the peak to the total energy.
Note however that when the signal repeats itself, you will get multiple cross-correlation peaks.

Categories