Greetings,
I am currently developing a tuner application using Silverlight/c# for class project. The problems i am having seem to be asked by quite a few people but not really answered. I have read a lot of forums and googled for hours but still cannot really grasp the code and math. What i have so far is thus:
Mic => audio input => audio samples are wrote to a memory stream => bytes converted to doubles then to complex numbers => FFT() which returns an array of complex numbers...
I have read about FFT/DFT/Autocorrelation/etc. It seems to me that FFT is the way i want to go for speed. I am, essentially, turning a chromatic tuner used in band/orchestra/etc into a online application so everything needs to be done real time. For now im just focusing on trying to understand the entire process.
Questions:
What is the correct method of converting the bytes wrote to the memory stream to complex numbers? This is partially answered here Convert Audio samples from bytes to complex numbers?, but i do not know which is the correct method for each one results in different values.
I understand the basics of FFT, but not exactly sure what the numbers represent at the different stages. For example, what exactly does the array of complex numbers represent when going into the fft algorithm, and what do they represent when leaving?
What other processing is required to find the freqency of the note being played after the FFT has been calculated?
I appreciate all the help, this project has proved to be more complicated than what i orginally researched! :/
Cheers and thanks!
Josh
1) Got nothin'
2) An FFT returns an array of values. Each array member contains the strength of the signal in that frequency band.
3) First, find the array member that has the strongest value. To dial in the exact frequency, you'll probably have to do some interpolation between the array members around the strongest bucket.
EDIT: Found this article. Looks like it breaks it down for you.
Related
I am developing an application in C# with spectrogram drawing functionality.
For my fist try, I used MathNet.Numerics, and now I am continuing to develop with alglib. When I changed from one to the other, I noticed that the output differs for them. Mathnet uses some kind of correction by default, which alglib seems to omit. I am not really into signal processing, also a newbie to programming, and I could not figure out what the difference exactly comes from.
MathNet default output (raw magnitude) values are ranging from ~0.1 to ~274 in my case.
And with alglib I get values ranging from ~0.2 to ~6220.
I found that MathNet Fourier.Forward uses a default scaling option. Here is says, the FourierOptions.Default is "Universal; Symmetric scaling and common exponent (used in Maple)."
https://numerics.mathdotnet.com/api/MathNet.Numerics.IntegralTransforms/FourierOptions.htm
If I use FourierOptions.NoScaling, the output is the same as what alglib produces.
In MathNet, I used Fourier.Forward function: https://numerics.mathdotnet.com/api/MathNet.Numerics.IntegralTransforms/Fourier.htm#Forward
In case of alglib, I used fftr1d function: https://www.alglib.net/translator/man/manual.csharp.html#sub_fftr1d
What is that difference in their calculation?
What is the function that I could maybe use to convert alglib output magnitude to that of MathNet, or vice versa?
In what cases should I use these different "scalings"? What are they for exactly?
Please share your knowledge. Thanks in advance!
I worked it out by myself, after reading a bunch of posts mentioning different methods of FFT output scaling. I still find this aspect of FFT processing heavily unsdocumented everywhere. I have not yet found any reliable source that explains what is the use of these scalings, which fields of sciences or what processing methods use them.
I have yet found out three different kinds of scalings, regarding the raw FFT output (complexes' magnitudes). This means multiplying them by: 1. 1/numSamples 2. 2/numSamples 3. 1/sqrt(numSamples) 4. (no scaling)
MathNet.IntegralTransforms.Fourier.Forward function (and according to various posts on the net, also possibly Matlab and Maple) by default, uses the third one. Which results in the better distinguishable graphical output when using logarithmic colouring, in my opinion.
I would still be grateful if you know something and share your ideas, or if you can reference a good paper explaining on these.
I'm calculating the autocorrelation of audio samples. The direct calculation of autocorrelation can be sped from O(n^2) to O(nlogn) by using the the FFT - exploiting the convolution theorem. Both forward and inverse FFT are needed.
I made a test script in python, just to make sure I knew what I was doing, and it works. But in my C# version, it doesn't.
I know that many implementations of the FFT give answers that differ from the mathematically strict DFT. For instance, you may need to divide your results by N (the number of bins.)
... tried that, still didn't work ...
I've striven mightily to find some documentation about the details of Exocortex's FFT, to no avail. (cue someone finding it in < 1 sec...)
Does anyone out there know the details of the Exocortex implementation of FFT, and how to get the mathematically strict values for the DFT, and inverse DFT of a signal?
I've got my code working!
As everybody has probably guessed, there was an additional bug in my code, which was confounding my efforts to understand Exocortex's fft.
In Exocortex's forward fft, you need to divide by the fft size to get the canonical values for the transform. For the inverse fft, you need to multiply by the fft size. I believe this is the same as Apple's Accelerate, whereas in numpy and matlab you get the actual DFT values.
Perhaps the practice of requiring division or multiplication by the fft size is extremely widespread – if you know the situation, I invite you to comment.
Of course, many times people are only interested in fft values relative to each other, in which case scaling by a constant doesn't matter.
If anyone out there knows where there is decent documentation for Exocortex DSP, I invite you to comment on that as well.
I am working with a project which will take input from another system. The input can be signals in of form parameters (integer values). My application will play different sounds dynamically base on inputs values. For example, when program get parameter value 2 it should play 2 files with sounds A and B parallel. I can play sounds simultaneous with naudio but so far they are hard coded. And I need to play sounds from a folder which can have N sounds. Is possible that some way to select and play only files that contain A and B sounds dynamically if I get value 2? but if I get parameter value 3 then situation will be different, for instance I would like to play a sound C. Please suggest me any piece of code or ideas that help me to write an algorithm that does some like this?
I really appreciate your valuable guidance
Thanks in advance
You need a mixer object to combine the wave data from the source files and produce an output waveform that can be fed to your WaveOut instance. There a few mixers in NAudio: MixingWaveProvider16, WaveMixerStream32 and MixingSampleProvider for instance. These will help you combine the wave data from the source files into a composite waveform that you can play.
The other option is to implement an IWaveProvider that will handle multiple inputs and mix down to a single output stream in the desired format. In this scenario you would have an output buffer and a list of input streams/buffers to be mixed. When more data is required in the output buffer it would mix the source data on-the-fly, possibly using one of the provided mixer classes.
Personally I'd suggest writing your own mixer. It's good exercise and will give you a better understanding of what's going on.
If you're mixing 16-bit streams you need to use a 32-bit buffer to mix into, as two 16-bit inputs added together can produce values greater than a 16-bit value can hold. The more inputs you have, the greater the likelihood of going outside the valid range. There are a few ways to deal with this. Just clamping the output between Int16.MinValue and Int16.MaxValue is a quick-and-nasty approach that may work for you. As tempting as it may seem, I would suggest you NOT try to average samples... it reduces the overall dynamic range of the output too much.
I'm writing a lexer generator as a spare time project, and I'm wondering about how to go about table compression. The tables in question are 2D arrays of short and very sparse. They are always 256 characters in one dimension. The other dimension is varying in size according to the number of states in the lexer.
The basic requirements of the compression is that
The data should be accessible without decompressing the full data set. And accessible in constant O(1) time.
Reasonably fast to compute the compressed table.
I understand the row displacement method, which is what I currently have implemented. It might be my naive implementation, but what I have is horrendously slow to generate, although quite fast to access. I suppose I could make this go faster using some established algorithm for string searching such as one of the algorithms found here.
I suppose an option would be to use a Dictionary, but that feels like cheating, and I would like the fast access times that I would be able to get if I use straight arrays with some established algorithm. Perhaps I'm worrying needlessly about this.
From what I can gather, flex does not use this algorithm for it's lexing tables. Instead it seems to use something called row/column equivalence which I haven't really been able to find any explanation for.
I would really like to know how this row/column equivalence algorithm that flex uses works, or if there is any other good option that I should consider for this task.
Edit: To clarify more about what this data actually is. It is state information for state transitions in the lexer. The data needs to be stored in a compressed format in memory since the state tables can potentially be huge. It's also from this memory that the actual values will be accessed directly, without decompressing the tables. I have a working solution using row displacement, but it's murderously slow to compute - in partial due to my silly implementation.
Perhaps my implementation of the row displacement method will make it clearer how this data is accessed. It's a bit verbose and I hope it's OK that I've put it on pastebin instead of here.
The data is very sparse. It is usually a big bunch of zeroes followed by a few shorts for each state. It would be trivial to for instance run-length encode it but it would spoil the
linear access time.
Flex apparently has two pairs of tables, base and default for the first pair and next and check for the second pair. These tables seems to index one another in ways I don't understand. The dragon book attempts to explain this, but as is often the case with that tome of arcane knowledge what it says is lost on lesser minds such as mine.
This paper, http://www.syst.cs.kumamoto-u.ac.jp/~masato/cgi-bin/rp/files/p606-tarjan.pdf, describes a method for compressing sparse tables, and might be of interest.
Are you tables known beforehand, and you just need an efficient way to store and access them?
I'm not really familiar with the problem domain, but if your table has a fix size along one axis (256), then would a array of size 256, where each element was a vector of variable length work? Do you want to be able to pick out an element given a (x,y) pair?
Another cool solution that I've always wanted to use for something is a perfect hash table, http://burtleburtle.net/bob/hash/perfect.html, where you generate a hash function from your data, so you will get minimal space requirements, and O(1) lookups (ie no collisions).
None of these solutions employ any type of compression, tho, they just minimize the amount of space wasted..
What's unclear is if your table has "sequence property" in one dimension or another.
Sequence property naturally happens in human speech, since a word is composed of many letters, and the sequence of letters is likely to appear later on. It's also very common in binary program, source code, etc.
On the other hand, sampled data, such as raw audio, seismic values, etc. do not advertise sequence property. Their data can still be compressed, but using another model (such as a simple "delta model" followed by "entropy").
If your data has "sequence property" in any of the 2 dimensions, then you can use common compression algorithm, which will give you both speed and reliability. You just need to provide it with an input which is "sequence friendly" (i.e. select your dimension).
If speed is a concern for you, you can have a look at this C# implementation of a fast compressor which is also a very fast decompressor : https://github.com/stangelandcl/LZ4Sharp
I am interested in writing a program that calculates the value of Pi to infinite decimal places. Its something I'm just looking to do to pass time, and have something to work on. So, regarding that, I have a couple of questions:
Benchmarking softwares that calculate Pi, what system do they use? Do they use an infinite function that converges towards the value if pi with every passing iteration? If so, what is the best function?
What would be the best approach to store the calculated value in a variable? Obviously, no existing data type will be able to hold that amount of information so how do I tackle that?
If its for the challenge try building your own datastructure using a list of ArrayLists and maybe just for fun / challenge / really long numbers add virtualization (= storing unused parts of the number to disk).
Sounds like fun :)
See Viete's forumla. Wikipedia also has a PHP example implementation near the bottom of this page.
As far as an arbitrary precision .Net library goes, I've heard good things about W3b.sine