When I record with nAudio using waveIn the buffer values that I get are large, my chart needs to be scaled to around 20 000 to display the samples correctly. When I replay this audio from the recorded wave file and capture samples from the waveOut using a SampleChannel sample provider the values are tiny.
The file format says the values it is giving me have a blockAlign of 8 with 32 bits per sample (float?) and 2 channels. Does this mean that 2x4 floats should be combined in some way to create each channels value?
I notice the floats in the supplied buffer arrays are discrete, they are multiples of 3.05175781E-05 in float format.
I'm at a bit of a loss as to what to do here. Do I need to process the floats that a waveout sampleProvider creates?
With waveIn, you're likely recording 16 bit samples, so they are short or Int16 values in the range -32768 to 32767
When you deal with floating point (float or Single) samples, they are normalised into the range -1.0 to 1.0.
Related
I have some animation data (x,y,z), which is represented as 2 byte structures and written in Little Endian. I know that they should be a 4 byte floating point, so i have to unpack them. I collected a few sample values as precise as it was possible (they doesn't represent exactly packed values, but very close to them) and roughly divided packed values on few ranges.
Sample values (Little Endian):
0.048879981 - 0x0046
0.056879997 - 0x0047
0.253880024 - 0x0050
0.313879967 - 0x0051
0.623880029 - 0x0055
1.003879905 - 0x0058
-0.066120029 - 0x00С8
-0.1561199428 - 0x00СD
-0.8691199871 - 0x00D7
Ranges:
0x0000 : zero
[0x0000,0x0014] : invisible changes (increasing probably)
[0x0014, ....] : increasing (visible)
0x0080 : zero, probably the point of sign change
[0x0080,0x00B0] : invisible changes (decreasing probably)
[0x00B0, ....] : decreasing (visible)
There are gaps (....) on the ends of ranges because it is hard to check them correctly, but i assume such big values which are lying close to these ends doesn't used in practice.
Also, it looks like a symmetry between positive and negative ranges, for example i tested 0x0058 which gave 1.003879905 and 0x00D8 which gave value close to -1.003879905 but not precise. Maybe it happened because of slightly offset observed after 0x0080, when visible decreasing starts from 0x00B0, but it should be about 0x0094 if entire range had equal symmetry. But slight measure inaccuracy might be as well.
So, how to get a function in C#, that will convert source data to 4 byte floating point?
Some initial comments based on the information in the question so far:
byte[] buffer = new byte[4]; is a bad approach because it addresses bytes individually while the other code manipulates bits using shifts within words, and C# does not define endianness. Simply use an unsigned 32-bit integer for all the work. The code will actually be simpler.
The code does not handle subnormal values properly. If num2 is zero and num3 is not zero, the significand (num3) must be shifted and the exponent (num2) must be adjusted.
I want to change source wav file bits per sample from 16 to 32
To do this I update:
raw data: convert each 2 bytes to short, then cast to float then to bytes array (instead of initial two bytes)
data length field by doubling
bits per sample by doubling
byte rate field just by doubling
block align parameter by doubling
After saving wav file a lot of noise appears and sound becomes very loud.
What am I doing wrong?
For example take the case of a stereo channel wav file with sample rate as 44100 and a bit depth of 16 bits.
Exactly how is the 16 bits divided up?
The audio clip that I was using, the first 4 bytes had data about the first audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).
The next 4 bytes had data about the second audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).
So I would like to figure out what those 4 bits are.
A WAV File contains several chunks.
The FMT chunk specifies the format of the audio data.
The actual audio data are within the data chunk.
It depends on the actual format. But let's assume the following format as example:
PCM, 16 bit, 2 channels with a samplerate of 44100Hz.
Audio data is represented as samples. In this case each sample takes 16 bits = 2 Bytes.
If we got multiple channels (in this examples 2 = Stereo), it will look like this:
left sample, right sample, left sample, right sample, ...
since each sample takes 2 Bytes (16 bits) we got something like this:
Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 | ...
left sample | right sample | left sample | right sample | ...
Each second of audio contains 44100 samples for EACH channel.
So in total, one second of audio takes 44100 * ( 16 / 8 ) * 2 Bytes.
WAV format audio file starts with a 44 byte header followed by the payload which is the uncompressed raw PCM audio data ... in the payload area as you walk across the PCM data each sample (point on audio curve) will contain data for all channels ... header will tell you number of channels ... for stereo using bit depth of 16 you will see two bytes (16 bits == bit depth) for a given channel immediately followed by the two bytes of the next channel etc...
For a given channel a given set of bytes (2 bytes in your case) will appear in two possible layouts determined by choice of endianness ... 1st byte followed by 2nd byte ... ordering of endianness is important here ... header also tells you what endianness you are using ... typically WAV format is little endian
each channel will generate its own audio curve
in your code to convert from PCM data into a usable audio curve data point you must combine all bytes of a given sample for given channel into a single value ... typically its integer and not floating point again the header defines which ... if integer it could be signed or unsigned ... little endian means as you read the file the first (left most) byte will become the least significant byte followed by each subsequent byte which becomes the next most significant byte
in pseudo code :
int mydatapoint // allocate your integer audio curve data point
step 0
mydatapoint = most-significant-byte
stop here for bit depth of 8
... if you have bit depth greater than 8 bits now left shift this to make room for the following byte if any
step 1
mydatapoint = mydatapoint << 8 // shove data to the left by 8 bits
// which effectively jacks up its value
// and leaves empty those right most 8 bits
step 2
// following operation is a bit wise OR operation
mydatapoint = mydatapoint OR next-most-significant-byte
now repeat doing steps 1 & 2 for each subsequent next byte of PCM data in order from most significant to least significant (for little endian) ... essential for any bit depth beyond 16 so for 24 bit audio or 32 bit you will need to combine 3 or 4 bytes of PCM data into your single integer output audio curve data point
Why are we doing this bit shifting nonsense
The level of audio fidelity when converting from analog to digital is driven by how accurately are you recording the audio curve ... analog audio is a continuous curve however to become digital it must be sampled into discrete points along the curve ... two factors determine the fidelity when sampling the analog curve to create its digital representation ... the left to right distance along the analog audio curve is determined by sample rate and the up and down distance along the audio curve is determined by bit depth ... higher sample rate gives you more samples per second and a greater bit depth gives you more vertical points to approximate the instantaneous height of the analog audio curve
bit depth 8 == 2^8 == 256 distinct vertical values to record curve height
bit depth 16 == 2^16 == 65536 distinct vertical values to record curve height
so to more accurately record into digital the height of our analog audio curve we want to become as granular as possible ... so the resultant audio curve is as smooth as possible and not jagged which would happen if we only allocated 2 bits which would give us 2^2 which is 4 distinct values ... try to connect the dots when your audio curve only has 4 vertical values to choose from on your plot ... the bit shifting is simply building up a single integer value from many bytes of data ... numbers greater than 256 cannot fit into one byte and so must be spread across multiple bytes of PCM data
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
This might be a wide answer but i would like to see answers and discuss this thread with SO users.
So far i guess a Audio File(WAV) has a Sample Rate which could be 44000 or 48000 (i've seen most these 2), and from that we can determine that a single Second into a File (second 00:00:01) has exactly 44000 Integer Values which means here we have an Int[], so if an Audio File Duration is 5 Seconds it has 5 * 44000 Integers (or 5 Samples).
So my question is, how can we calculate the difference (or similarity) of content between two time spans, like Audio1.wav and Audio2.wav at 00:00:01 with same Sample Rate.
There are couple assumptions in your reasoning:
1. The file is the raw uncompressed (PCM encoded) data.
2. There is only one channel (mono).
It's better to start from reading some format descriptions and sample implementations, then search for some audio comparison algorithms (1, 2, 3).
Linked Q: Compare two spectogram to find the offset where they match algorithm
One way to do this would be to resample the signal from 44100 Hz to 48000 Hz, so both signals have the same samplerate, and perform a cross-correlation. The shape of the cross-correlation could be a measure of similarity. You could look at the height of the peak, or the ratio of energy in the peak to the total energy.
Note however that when the signal repeats itself, you will get multiple cross-correlation peaks.
I am trying to decode the magnetic heading that is contained in a 10bit field. I am not sure how the above instructions are interpreted. What i did is just took the 10 bits and convert them to decimal like this
int magneticheading = Convert.ToInt32(olotoMEbinary.Substring(14, 10), 2);
But then i checked that 259degrees only need 9bits to be expressed in binary (100000011). I am confused about what does a most significant bit of 180 degrees mean and a lsb of 360/1 024 .
For example if i receive the following 10bits 0100001010 how are they converted to degrees according to the above instructions?
Using floating-point math, multiply by 360 and divide by 1024.
The instructions the question references are missing, but Stephen Cleary's method appears to fit the two data points provided. It may help to think of it as a unit conversion from 1024 divisions of a circle to 360.