play audio file dynamically with naudio - c#

I am working with a project which will take input from another system. The input can be signals in of form parameters (integer values). My application will play different sounds dynamically base on inputs values. For example, when program get parameter value 2 it should play 2 files with sounds A and B parallel. I can play sounds simultaneous with naudio but so far they are hard coded. And I need to play sounds from a folder which can have N sounds. Is possible that some way to select and play only files that contain A and B sounds dynamically if I get value 2? but if I get parameter value 3 then situation will be different, for instance I would like to play a sound C. Please suggest me any piece of code or ideas that help me to write an algorithm that does some like this?
I really appreciate your valuable guidance
Thanks in advance

You need a mixer object to combine the wave data from the source files and produce an output waveform that can be fed to your WaveOut instance. There a few mixers in NAudio: MixingWaveProvider16, WaveMixerStream32 and MixingSampleProvider for instance. These will help you combine the wave data from the source files into a composite waveform that you can play.
The other option is to implement an IWaveProvider that will handle multiple inputs and mix down to a single output stream in the desired format. In this scenario you would have an output buffer and a list of input streams/buffers to be mixed. When more data is required in the output buffer it would mix the source data on-the-fly, possibly using one of the provided mixer classes.
Personally I'd suggest writing your own mixer. It's good exercise and will give you a better understanding of what's going on.
If you're mixing 16-bit streams you need to use a 32-bit buffer to mix into, as two 16-bit inputs added together can produce values greater than a 16-bit value can hold. The more inputs you have, the greater the likelihood of going outside the valid range. There are a few ways to deal with this. Just clamping the output between Int16.MinValue and Int16.MaxValue is a quick-and-nasty approach that may work for you. As tempting as it may seem, I would suggest you NOT try to average samples... it reduces the overall dynamic range of the output too much.

Related

Unity3D - What file format should I store my replays

I a making a state saving replay system in Unity3D and I want the replay to be written to a file. What file format is best to use when saving an replay? Xml maybe? For now I'm storing the transform data and I've implemented the option to add additional frame data.
It highly depends on what you are trying to do. There are tradeoffs for every solution in this case.
Based on the extra information you have given in the comments, the best solution I can think of in this case is marshalling your individual "recording sessions" onto a file. However, there is a little overhead to be done in order to achieve this.
Create a class called Frame, create another class called Record which has a List<Frame> frames. That way, you can place any information that you would like to be captured in each frame as attributes in the Frame class.
Since you can't marshal a generic type, you will have to marshall each frame individually. I suggest implementing a method in the Record class called MarshalRecording() that handles that for you.
However, in doing that, you will find it difficult to unmarshal your records because they may have different sizes in binary form and they wouldn't have a separator indicating where a frame ends and where the next frame begins. I suggest appending the size information at the beginning of each marshalled frame, that way you will be able to unmarshal all of the frames even if they have different sizes.
As #PiotrK pointed out in his answer, you could use protobuf. However, I don't recommend it for your specific use. Personally, I think this is overkill (Too much work for too little results, protobufs can be a PITA some times).
If you are worried about storage size, you could LZ4 the whole thing (if you are concatenating the binary information in memory), or LZ4 each frame (if you are processing each frame individually and then appending it to a file). I recommend the latter one because, depending on the amount of frames, you may run out of memory while you marshal your record.
Ps: Never use XML, it's cancerous!
The best way I know is to use some kind of format generator (like Google Protobuf). This has several advantage over using default C# serializer:
Versioning of format is as easy as possible; You can add new feature to the format and that will not break replays already existing in the field
The result is stored as either text or binary, with binary favorable and having very small footprint (for example if some of your data has default values, it won't be present in output at all - loader will handle them gracefully)
It's Google technology! :-)

calculating fft with complex number in c#

I use this formula to get frequency of a signal but I dont understand how to implement code with complex number? There is "i" in formula that relates Math.Sqrt(-1). How can I code this formula to signal in C# with NAduio library?
If you want to go back to a basic level then:
You'll want to use some form of probabilistic model, something like a hidden Markov model (HMM). This will allow you to test what the user says to a collection of models, one for each word they are allowed to say.
Additionally you want to transform the audio waveform into something that your program can more easily interpret. Something like a fast Fourier transform (FFT) or a wavelet transform (CWT).
The steps would be:
Get audio
Remove background noise
Transform via FFT or CWT
Detect peaks and other features of the audio
Compare these features with your HMMs
Pick the HMM with the best result about a threshold.
Of course this requires you to previously train the HMMs with the correct words.
A lot of languages actually provide Libraries for this that come, built in. One example, in C#.NET, is at this link. This gives you a step by step guide to how to set up a speech recognition program. It also abstracts you away from the low level detail of parsing audio for certain phenomes etc (which frankly is pointless with the amount of libraries there are about, unless you wish to write a highly optimized version).
It is a difficult problem nonetheless and you will have to use a ASR framework to do it. I have done something slightly more complex (~100 words) using Sphinx4. You can also use HTK.
In general what you have to do is:
write down all the words that you want to recognize
determine the syntax of your commands like (direction) (amount)
Then choose a framework, get an acoustic model, generate a dictionary and a language model compatible with that framework. Then integrate the framework into your application.
I hope I have mentioned all important things you need to do. You can google them separately or go to your chosen framework's tutorial.
Your task is relatively simple in terms of speech recognition and you should get good results if you complete it.

Word game mechanics in XNA

I'm planning on making a casual word game for WP7 using XNA. The game mechanics are fine enough for me to implement but it is just the checking to see if the word they make is actually a word or not.
I thought about having a text file and loading that into memory at the start, but surely this wouldn't be possible to keep in memory for a phone? Also how slow would it be to read from this to see if it is a word. How would they be stored in memory? Would it be best to use a dictionary/hashmap and each key is a word and i just check to see if that key exists? Or would it put them in an array?
Stuck on the best way to implement this, so any input is appreciated. Thanks
Depending on your phones hardware, you could probably just load up a text file into memory. The english language probably has only a couple hundred thousand words. Assuming your average word is around 5 characters or so, thats roughly a meg of data. You will have overhead managing that file in memory, but thats where specifics of hardware matter. BTW, it's not uncommon for current generation of phones to have a gig of RAM.
Please see the following related SO questions which require a text file for a dictionary of words.
Dictionary text file
Putting a text file into memory, even of a whole dictionary, shouldn't be too bad as seth flowers has said. Choosing an appropriate data structure to hold the words will be important.
I would not recommend a dictionary using words as keys... that's kind of silly honestly. If you only have keys and no values, what good is a dictionary? However, you may be on a good track with the Dictionary idea. The first thing I would try would be a Dictionary<char, string[]>, where the key is the first letter, and the value is a list of all words beginning with that letter. Of course, that array will be very long, and search time on the array slow (though lookup on the key should be zippy, as char hashes are unique). The advantage is that, if you use the proper .txt dictionary file and load each word in order, you will know that list is ordered by alphabet. So, you can use efficient search techniques like binary search, or any number of searches formulated for pre-sorted lists. It may not be that slow in the end.
If you want to go further, though, you can use the structure which underlies predictive text. It's called a Patricia Trie, or Radix Trie (Wikipedia). Starting with the first letter, you work your way through all possible branches until you either:
assemble the word the user entered, so it is a valid word
reach the end of the branch; this word does not exist.
'Tries' were made to address this sort of problem. I've never represented one in code, so I'm afraid I can't give you any pointers (ba dum tsh!), but there's likely a wealth of information on how to do it available on the internet. Using a Trie will likely be the most efficient solution, but if you find that an alphabet Dictionary like I mentioned above is sufficiently fast using binary search, you might just want to stick with that for now while you develop the actual gameplay. Getting bogged down with finding the best solution when just starting your game tends to bleed off your passion for getting it done. If you run into performance issues, then you make improvements-- at least that's my philosophy when designing games.
The nice thing is, since Windows Phone supports only essentially 2 different specs, once you test the app and see it runs smoothly on them, you really don't have to worry about optimizing for any worse conditions. So use what works!
P.S.: on Windows Phone, loading text files is tricky. Here is a post on the issue which should help you.

Sparse matrix compression with fast access time

I'm writing a lexer generator as a spare time project, and I'm wondering about how to go about table compression. The tables in question are 2D arrays of short and very sparse. They are always 256 characters in one dimension. The other dimension is varying in size according to the number of states in the lexer.
The basic requirements of the compression is that
The data should be accessible without decompressing the full data set. And accessible in constant O(1) time.
Reasonably fast to compute the compressed table.
I understand the row displacement method, which is what I currently have implemented. It might be my naive implementation, but what I have is horrendously slow to generate, although quite fast to access. I suppose I could make this go faster using some established algorithm for string searching such as one of the algorithms found here.
I suppose an option would be to use a Dictionary, but that feels like cheating, and I would like the fast access times that I would be able to get if I use straight arrays with some established algorithm. Perhaps I'm worrying needlessly about this.
From what I can gather, flex does not use this algorithm for it's lexing tables. Instead it seems to use something called row/column equivalence which I haven't really been able to find any explanation for.
I would really like to know how this row/column equivalence algorithm that flex uses works, or if there is any other good option that I should consider for this task.
Edit: To clarify more about what this data actually is. It is state information for state transitions in the lexer. The data needs to be stored in a compressed format in memory since the state tables can potentially be huge. It's also from this memory that the actual values will be accessed directly, without decompressing the tables. I have a working solution using row displacement, but it's murderously slow to compute - in partial due to my silly implementation.
Perhaps my implementation of the row displacement method will make it clearer how this data is accessed. It's a bit verbose and I hope it's OK that I've put it on pastebin instead of here.
The data is very sparse. It is usually a big bunch of zeroes followed by a few shorts for each state. It would be trivial to for instance run-length encode it but it would spoil the
linear access time.
Flex apparently has two pairs of tables, base and default for the first pair and next and check for the second pair. These tables seems to index one another in ways I don't understand. The dragon book attempts to explain this, but as is often the case with that tome of arcane knowledge what it says is lost on lesser minds such as mine.
This paper, http://www.syst.cs.kumamoto-u.ac.jp/~masato/cgi-bin/rp/files/p606-tarjan.pdf, describes a method for compressing sparse tables, and might be of interest.
Are you tables known beforehand, and you just need an efficient way to store and access them?
I'm not really familiar with the problem domain, but if your table has a fix size along one axis (256), then would a array of size 256, where each element was a vector of variable length work? Do you want to be able to pick out an element given a (x,y) pair?
Another cool solution that I've always wanted to use for something is a perfect hash table, http://burtleburtle.net/bob/hash/perfect.html, where you generate a hash function from your data, so you will get minimal space requirements, and O(1) lookups (ie no collisions).
None of these solutions employ any type of compression, tho, they just minimize the amount of space wasted..
What's unclear is if your table has "sequence property" in one dimension or another.
Sequence property naturally happens in human speech, since a word is composed of many letters, and the sequence of letters is likely to appear later on. It's also very common in binary program, source code, etc.
On the other hand, sampled data, such as raw audio, seismic values, etc. do not advertise sequence property. Their data can still be compressed, but using another model (such as a simple "delta model" followed by "entropy").
If your data has "sequence property" in any of the 2 dimensions, then you can use common compression algorithm, which will give you both speed and reliability. You just need to provide it with an input which is "sequence friendly" (i.e. select your dimension).
If speed is a concern for you, you can have a look at this C# implementation of a fast compressor which is also a very fast decompressor : https://github.com/stangelandcl/LZ4Sharp

Tuner application: How do find the frequency?

Greetings,
I am currently developing a tuner application using Silverlight/c# for class project. The problems i am having seem to be asked by quite a few people but not really answered. I have read a lot of forums and googled for hours but still cannot really grasp the code and math. What i have so far is thus:
Mic => audio input => audio samples are wrote to a memory stream => bytes converted to doubles then to complex numbers => FFT() which returns an array of complex numbers...
I have read about FFT/DFT/Autocorrelation/etc. It seems to me that FFT is the way i want to go for speed. I am, essentially, turning a chromatic tuner used in band/orchestra/etc into a online application so everything needs to be done real time. For now im just focusing on trying to understand the entire process.
Questions:
What is the correct method of converting the bytes wrote to the memory stream to complex numbers? This is partially answered here Convert Audio samples from bytes to complex numbers?, but i do not know which is the correct method for each one results in different values.
I understand the basics of FFT, but not exactly sure what the numbers represent at the different stages. For example, what exactly does the array of complex numbers represent when going into the fft algorithm, and what do they represent when leaving?
What other processing is required to find the freqency of the note being played after the FFT has been calculated?
I appreciate all the help, this project has proved to be more complicated than what i orginally researched! :/
Cheers and thanks!
Josh
1) Got nothin'
2) An FFT returns an array of values. Each array member contains the strength of the signal in that frequency band.
3) First, find the array member that has the strongest value. To dial in the exact frequency, you'll probably have to do some interpolation between the array members around the strongest bucket.
EDIT: Found this article. Looks like it breaks it down for you.

Categories