Related
I've Google'd and read quite a bit on QR codes and the maximum data that can be used based on the various settings, all of it being in tabular format. I can't seem to find anything giving a formula or a proper explanation of how these values are calculated.
What I would like to do is this:
Present the user with a form, allowing them to choose Format, EC & Version.
Then they can type in some data and generate a QR code.
Done deal. That part is easy.
The addition I would like to include is a "remaining character count" so that they (the user) can see how much more data they can type in, as well as what effect the properties have on the storage capacity of the QR code.
Does anyone know where I can find the formula(s)? Or do I need to purchase ISO 18004:2006?
A formula to calculate the amount of data you could put in a QRcode would be quite complex to make, not mentioning it would need some approximations for the calculation to be possible. The formula would have to calculate the amount of modules dedicated to the data in your QRCode based on its version, and then calculate how many codewords (which are sets of 8 modules) will be used for the error correction.
To calculate the amount of modules that will be used for the data, you need to know how many modules will be used for the function patterns. While this is not a problem for the three finder patterns, the timing or the version/format information, there will be a problem with the alignment patterns as their number is dependent on the QRCode's version, meaning you anyway would have to use a table at that point.
For the second part, I have to say I don't know how to calculate the number of error correcting codewords based on the correction capacity. For some reason, there are more error correcting codewords used that there should to match the error correction capacity, as for example a 6-H QRCode can correct up to 32.6% of the data, instead of the 30% set by the H correction level.
In any case, as you can see a formula would be quite complex to implement. Using a table like already suggested is probably the best thing you could do.
I wrote the original AIM specification for QR Code back in the '90s for Denso Corporation, and was also project editor for both editions of the ISO/IEC 18004 standard. It was felt to be much easier for people producing code printing software to use a look-up table rather than calculate capacities from a formula - no easy job as there are several independent variables that have to be taken into account iteratively when parsing the text to be encoded to minimise its length in bits, in order to achieve the smallest symbol. The most crucial factor is the mix of characters in the data, the sequence and lengths of sub-strings of numeric, alphanumeric, Kanji data, with the overhead needed to signal each change of character set, then the required level of error correction. I did produce a guidance section for this which is contained in the ISO standard.
The storage is calculated by the QR mode and the version/type that you are using. More specifically the calculation is based on how 'compressible' the characters are and what algorithm that the qr generator is allowed to use on the content present.
More information can be found http://en.wikipedia.org/wiki/QR_code#Storage
I am VERY new in the world of DSP and filtering. Like I started a week ago. Anyway, I have been looking for ways to use filters (low-pass, high-pass, notch, etc.) on some data I am getting. The data comes in an array of doubles and I can get more than 1 million points in this array. I am trying to filter out sound given a certain cutoff frequency but cannot get any algorithm to work. I have been up and down the internet and tried a bunch of different libraries and methods but I can't get any results. I am partial to the NAudio library because it seems to have everything I need (FFT and filtering by the BiQuadFilter class). I am pretty sure my problem is my extreme lack of the knowledge and math to get the desired output. Judging from what I have read, here is how I believe the process should go:
Insert data into FFT to put data into frequency domain
Pass resulting data into a filter (low, high, notch)
Do IFFT from results in step 2 to get back into time domain
Play sound
Is this the right way to filter audio? Can I shove the entire array into the FFT or do I have to break it up in smaller chunks? What do I do with the complex numbers that I get in the FFT result (ie just use the real part and throw away the imaginary, or use the magnitude and phase)? I really have no idea what the "right way" is.
EDIT
I finally got it working! Here is what I did:
byte[] data = doubleArray.SelectMany(value => BitConverter.GetBytes(value)).ToArray();
wms = new WaveMemoryStream(data, sampleRate, (ushort)audioBitsPerSample, (ushort)channels);
WaveFileReader wfr = new WaveFileReader(wms);
SampleChannel sample = new SampleChannel(wfr, false);
LowPassSampleProvider sampleProvider = new LowPassSampleProvider(sample);
WaveOutEvent player = new WaveOutEvent();
player.Init(sampleProvider);
player.Play();
doubleArray is the array of my accelerometer data, which currently holds 1 million points with each one somewhere around 1.84...
WaveMemoryStream is a class I found on another post here
LowPassSampleProvider is a class I made that implements ISampleProvider and passes the samples to the BiQuadFilter.LowPassFilter function.
The BiQuadFilter in NAudio operates in the time domain. You don't need to use FFT with it. Pass each sample into the Transform method to get the output sample. Use two filters one for left and one for right if you have stereo audio.
I typically make an ISampleProvider implementation that in the Read method reads from a source ISampleProvider (such as an AudioFileReader) and passes the samples through the filter.
Typically you would run the time domain data through a time domain filter. Another method, which is equivalent, is to take the FFT of the data and the FFT of the filter, multiply it in the frequency domain, then take the inverse FFT. For small filters the time domain approach is generally faster. You would typically do this on frames of the data, say 8192 samples passed through a filter. Then repeat for subsequent frames. Without looking at your code I'm unable to provide more help. Also, take a look at these examples using Intel's IPP. There's both time and frequency domain implementations that should help to get you going.
I've been trying to figure out the mystical realm of MIDI parsing, and I'm having no luck. All I'm trying to do is get the note value (60 = C4, 72 = C5, etc), in order of when they occur.
My code is as follows. All it does is very simply open a file as a byte array and read everything out as hex:
byte[] MIDI = File.ReadAllBytes("TestMIDI.mid");
foreach (var element in MIDI) {
string b = Convert.ToString(element,16);
Debug.WriteLine(b);
}
All TestMIDI.mid contains is one note on C5. Here's a hex dump of it. Using this info, I'm trying to find the simple hex value for Note On (0x9, or just 9 in the dump), but there aren't any. I can find a few 72's, but there are 3, which doesn't make any sense to me (note on, note off, then what?).
This is my first attempt at parsing MIDI as a file and using hex dumps (are they even called that?), so I'm sorry if I'm heading in the complete wrong direction. All I need is to get the note that plays, and in what order. I don't need timing or anything fancy at all. The reason behind this, if it matters - is to then generate new code in a different language to be played out of a speaker, very similar to the beep command on *nix. Because of this, I don't want to use any frameworks that 1) I didn't program, and really didn't learn anything and 2) do far more than what I need, making the framework heavier than the actual code by me.
Accepted answer is not a solution for the problem. It will not work in common case. I'll provide several cases where this code either will not work or will fail. Order of these cases corresponds their probability - most probable cases go first.
False positives. MIDI files contain a lot of data structures where you can find a byte with the value 144. And these structures are not Note On events. For real MIDI files you'll get bunch of "notes" that are not notes but random values within the file.
Channels other than 0. Most of the modern MIDI files contain several track chunks. Each one holds events for the specific MIDI channel (from 0 to 15). 144 (or 90 in hex) represents a Note On event for the channel 0. So you are going to miss a lot of Note On events for other channels.
Running status. MIDI files actively use concept of running status. This technique allows don't store status bytes of consecutive events of the same type. It means that status byte 144 can be written only once for the first Note On event and you will not find it further in the file.
144 is the last byte in a file. MIDI file can end with this value. For example if a custom chunk is the last chunk in the file or track chunk doesn't end with End of Track event (which is corruption according to MIDI file specification but possible scenario in real world). In this case you' ll get IndexOutOfRangeException on MIDI[i+1].
Thus, you should never search for specific value to find some semantic data structure in a MIDI file. You must use one of the .NET libraries available on the Internet. For example, with the DryWetMIDI you can use this code:
IEnumerable<Note> notes = MidiFile.Read(filePath)
.GetNotes();
To do this right, you'll need at least some semblance of a MIDI parser. Searching through 0x9 events is a good start, but 0x9 is also a Note-Off event if the velocity field is 0. 0x9 can also be present inside other events (meta events, MPQN events, delta times, etc), so you'll get false positives. So, you need something that actually knows the MIDI file format to do this accurately.
Look for a library, write your own, or port an open-source one. Mine is in Java if you want to look.
I need a library which would help me to save and query data in a condensed format (a mini DSL in essence) here's a sample of what I want:
Update 1 - Please note, figures in the samples above are made small just to make is easier to follow the logic, the real figures are limited with c# long type capacity, ex:
1,18,28,29,39,18456789,18456790,18456792,184567896.
Sample Raw Data set: 1,2,3,8,11,12,13,14
Condensed Sample Data set:
1..3,8,11..14
What would be absolute nice to have is to be able to present 1,2,4,5,6,7,8,9,10 as 1..10-3.
Querying Sample Data set:
Query 1 (get range):
1..5 -> 1..3
Query 2 (check if the value exists)
?2 -> true
Query 3 (get multiple ranges and scalar values):
1..5,11..12,14 -> 1..3,11..12,14
I don't want to develop it from scratch and would highly prefer to use something which already exists.
Here are some ideas I've had over the days since I read your question. I can't be sure any of them really apply to your use case but I hope you'll find something useful here.
Storing your data compressed
Steps you can take to reduce the amount of space your numbers take up on disk:
If your values are between 1 and ~10M, don't use a long, use a uint. (4 bytes per number.)
Actually, don't use a uint. Store your numbers 7 bits to a byte, with the remaining bit used to say "there are more bytes in this number". (Then 1-127 will fit in 1 byte, 128-~16k in 2 bytes, ~16k-~2M in 3 bytes, ~2M-~270M in 4 bytes.)
This should reduce your storage from 8 bytes per number (if you were originally storing them as longs) to, say, on average 3 bytes. Also, if you end up needing bigger numbers, the variable-byte storage will be able to hold them.
Then I can think of a couple of ways to reduce it further, given you know the numbers are always increasing and may contain lots of runs. Which works best for you only you can know by trying it on your actual data.
For each of your actual numbers, store two numbers: the number itself, followed by the number of numbers contiguous after it (e.g. 2,3,4,5,6 => 2,4). You'll have to store lone numbers as e.g. 8,0 so will increase storage for those, but if your data has lots of runs (especially long ones) this should reduce storage on average. You could further store "single gaps" in runs as e.g. 1,2,3,5,6,7 => 1,6,4 (unambiguous as 4 is too small to be the start of the next run) but this will make processing more complex and won't save much space so I wouldn't bother.
Or, rather than storing the numbers themselves, store the deltas (so 3,4,5,7,8,9 => 3,1,1,2,1,1. This will reduce the number of bytes used for storing larger numbers (e.g. 15000,15005 (4 bytes) => 15000,5 (3 bytes)). Further, if the data contains a lot of runs (e.g. lots of 1 bytes), it will then compress (e.g. zip) nicely.
Handling in code
I'd simply advise you to write a couple of methods that stream a file from disk into an IEnumerable<uint> (or ulong if you end up with bigger numbers), and do the reverse, while handling whatever you've implemented from the above.
If you do this in a lazy fashion - using yield return to return the numbers as you read them from disk and calculate them, and streaming numbers to disk rather than holding them in memory and returning them at once, you can keep your memory usage down whatever the size of the stored data.
(I think, but I'm not sure, that even the GZipStream and other compression streams will let you stream your data without having it all in memory.)
Querying
If you're comparing two of your big data sets, I wouldn't advise using LINQ's Intersect method as it requires reading one of the sources completely into memory. However, as you know both sequences are increasing, you can write a similar method that needs only hold an enumerator for each sequence.
If you're querying one of your data sets against a user-input, small list of numbers, you can happily use LINQ's Intersect method as it is currently implemented, as it only needs the second sequence to be entirely in memory.
I'm not aware of any off-the-shelf library that does quite what you want, but I'm not sure you need one.
I suggest you consider using the existing BitArray class. If, as your example suggests, you're interested in compressing sets of small integers then a single BitArray with, say 256 bits, could represent any set of integers in the range [0..255]. Of course, if your typical set has only 5 integers in it then this approach would actually expand your storage requirements; you'll have to figure out the right size of such arrays from your own knowledge of your sets.
I'd suggest also looking at your data as sets of integers, so your example 1,2,3,8,11,12,13,14 would be represented by setting on the corresponding bits in a BitArray. Your query operations then reduce to intersection between a test BitArray and your data BitArray.
Incidentally, I think your example 2, which transforms 2 -> true, would be better staying in the domain of functions that map sets of integers to sets of integers, ie it should transform 2 -> 2. If you want to, write a different method which returns a boolean.
I guess you'd need to write code to pack integers into BitArrays and to unpack BitArrays into integers, but that's part of the cost of compression.
Let's say I'm trying to generate a monster for use in a roleplaying game from an arbitrary piece of input data. Think Barcode Battler or a more-recent iPod game whose name escapes me.
It seems to me like the most straightforward way to generate a monster would be to use a hash function on the input data (say, an MP3 file) and use that hash value to pick from some predetermined set of monsters, or use pieces of the hash value to generate statistics for a custom monster.
The question is, are there obvious methods for taking an arbitrary piece of input data and hashing it to one of a fixed set of values? The primary goal of hashing algorithms is, after all, to avoid collisions. Instead, I'm suggesting that we want to guarantee them - that, given a predetermined set of 100 monsters, we want any given MP3 file to map to one of them.
This question isn't bound to a particular language, but I'm working in C#, so that would be my preference for discussion. Thanks!
Hash the file using any hash function of your choice, convert the result into an integer, and take the result modulo 100.
monsterId = hashResult % 100;
Note that if you later decide to add a new monster and change the code to % 101, nearly all hashes will suddenly map to different monsters.
Okay, that's a very nice question. I would say: don't use hash, because this won't be a nice way for the player to predict patterns. From cognitive theory we know that one thing that is interesting in games is that player can learn by trial and error. So if player gives the input of an image of a red dragon and another image of a red dragon with slightly different pixels, he would like to have the same monster appearing, right? If you use hashes that would not be the case.
Instead, I would recommend doing much simpler things. Imagine that your raw piece of input is just a byte[] , it is itself already a list of numbers. Unfortunately it's only a list of numbers from 0 to 255, so if you for example do an average, you can get 1 number from 0 to 255 . That you could map to a number of monsters already, if you need more, you can read pairs of bytes and just compose Int16, that way you will be able to go up to 65536 possible monsters :)
You can use the MD5, SHA1, or SHA2 of a file as a unique finger print for the file. Each hash function will give you a larger, less overlapping fingerprint and each can be obtained by library functions already in the base libraries.
In truth you could probably hash a much smaller portion of the file, for instance the first 1-3MB of the file and still get a fairly unique fingerprint, without the expense of processing a larger file (like an AVI).
Look in the System.Security namespace for the MD5Crypto provider for an example of how to generate a MD5 from a byte sequence.
Edit: If you want to ensure that the hash collides in a relatively short order you can use CRC2, 4, 6, 8, 16, 32 which will collide fairly frequently (especially CRC2 :)) but be the same for the same file. It is easy to generate.