algorithm-shorten +10 digits string - c#

I'm using C# language for a project. I need to provide user with large (9+ digit) number, which they will have to reenter into another system (for later data correlation). Having a user enter a number that large (by hand) with no errors will be almost impossible.
I have been trying to come up with a solution to shorten that number using base64, but all the code I have found will create a string combination of character and digits. Is there a simple math algorithm I can use to make a large number smaller? The result should be numeric not alpha numeric.

You address the problem in a wrong way, instead of changing the number size just build a convenient way for the user to copy past the number , a simple key event wich will copy the number to the buffer, then the user will not have to write the number down.

Reducing a number using only numbers will never work.
What you really need is some form of error checking.
One that works very good is the Verhoeff Algorithm that will detect almost every typo. There are many examples to find online.
like:
https://www.codeproject.com/articles/15939/verhoeff-check-digit-in-c

You can use a Hash algorithm to hash your large number, but you need to deal with hash collision.
One of those very easy to implement is checksum sum16:
https://en.wikipedia.org/wiki/List_of_hash_functions
See sum16 you can only have 0-65536. Think about sum18 ?

Related

C#/ClosedXML: problem with reading long numbers as text

I have a table with account numbers in one column and I need to read it. Some of them are read just fine, but some are treated as numbers and either converted into a scientific notation or wrong if I change the format to "0"
For example this dummy account :
is read as
after changing the format to "#".
If I don't change it, it's wrong:
which, obviously is a completely random account number then.
I've searched for and tried out different options, but none is working.
Any idea? Thank you
Account numbers aren't real numbers. You're never going to perform mathematical operations on them. You're better off treating them as text. Use cell.SetValue(value.ToString()).

What is the formula to calculate a QR Code's maximum data?

I've Google'd and read quite a bit on QR codes and the maximum data that can be used based on the various settings, all of it being in tabular format. I can't seem to find anything giving a formula or a proper explanation of how these values are calculated.
What I would like to do is this:
Present the user with a form, allowing them to choose Format, EC & Version.
Then they can type in some data and generate a QR code.
Done deal. That part is easy.
The addition I would like to include is a "remaining character count" so that they (the user) can see how much more data they can type in, as well as what effect the properties have on the storage capacity of the QR code.
Does anyone know where I can find the formula(s)? Or do I need to purchase ISO 18004:2006?
A formula to calculate the amount of data you could put in a QRcode would be quite complex to make, not mentioning it would need some approximations for the calculation to be possible. The formula would have to calculate the amount of modules dedicated to the data in your QRCode based on its version, and then calculate how many codewords (which are sets of 8 modules) will be used for the error correction.
To calculate the amount of modules that will be used for the data, you need to know how many modules will be used for the function patterns. While this is not a problem for the three finder patterns, the timing or the version/format information, there will be a problem with the alignment patterns as their number is dependent on the QRCode's version, meaning you anyway would have to use a table at that point.
For the second part, I have to say I don't know how to calculate the number of error correcting codewords based on the correction capacity. For some reason, there are more error correcting codewords used that there should to match the error correction capacity, as for example a 6-H QRCode can correct up to 32.6% of the data, instead of the 30% set by the H correction level.
In any case, as you can see a formula would be quite complex to implement. Using a table like already suggested is probably the best thing you could do.
I wrote the original AIM specification for QR Code back in the '90s for Denso Corporation, and was also project editor for both editions of the ISO/IEC 18004 standard. It was felt to be much easier for people producing code printing software to use a look-up table rather than calculate capacities from a formula - no easy job as there are several independent variables that have to be taken into account iteratively when parsing the text to be encoded to minimise its length in bits, in order to achieve the smallest symbol. The most crucial factor is the mix of characters in the data, the sequence and lengths of sub-strings of numeric, alphanumeric, Kanji data, with the overhead needed to signal each change of character set, then the required level of error correction. I did produce a guidance section for this which is contained in the ISO standard.
The storage is calculated by the QR mode and the version/type that you are using. More specifically the calculation is based on how 'compressible' the characters are and what algorithm that the qr generator is allowed to use on the content present.
More information can be found http://en.wikipedia.org/wiki/QR_code#Storage

Searching for partial substring within string in C#

Okay so I'm trying to make a basic malware scanner in C# my question is say I have the Hex signature for a particular bit of code
For example
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\test.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c746573742e74787422293b
Gets Changed to -
{
System.IO.File.Delete(#"C:\Users\Public\DeleteTest\notatest.txt");
}
//Which will have a hex of 53797374656d2e494f2e46696c652e44656c657465284022433a5c55736572735c5075626c69635c44656c657465546573745c6e6f7461746573742e74787422293b
Keep in mind these bits will be within the entire Hex of the program - How could I go about taking my base signature and looking for partial matches that say have a 90% match therefore gets flagged.
I would do a wildcard but that wouldn't work for slightly more complex things where it might be coded slightly different but the majority would be the same. So is there a way I can do a percent match for a substring? I was looking into the Levenshtein Distance but I don't see how I'd apply it into this given scenario.
Thanks in advance for any input
Using an edit distance would be fine. You can take two strings and calculate the edit distance, which will be an integer value denoting how many operations are needed to take one string to the other. You set your own threshold based off that number.
For example, you may statically set that if the distance is less than five edits, the change is relevant.
You could also take the length of string you are comparing and take a percentage of that. Your example is 36 characters long, so (int)(input.Length * 0.88m) would be a valid threashold.
First, your program bits should match EXACTLY or else it has been modified or is corrupt. Generally, you will store an MD5 hash on the original binary and check the MD5 against new versions to see if they are 'the same enough' (MD5 can't guarantee a 100% match).
Beyond this, in order to detect malware in a random binary, you must know what sort of patterns to look for. For example, if I know a piece of malware injects code with some binary XYZ, I will look for XYZ in the bits of the executable. Patterns get much more complex than that, of course, as the malware bits can be spread out in chuncks. What is more interesting is that some viruses are self-morphing. This means that each time it runs, it modifies itself, meaning the scanner does not know an exact pattern to find. In these cases, the scanner must know the types of derivatives can be produced and look for all of them.
In terms of finding a % match, this operation is very time consuming unless you have constraints. By comparing 2 strings, you cannot tell which pieces were removed, added, or replaced. For instance, if I have a starting string 'ABCD', is 'AABCDD' a 100% match or less since content has been added? What about 'ABCDABCD'; here it matches twice. How about 'AXBXCXD'? What about 'CDAB'?
There are many DIFF tools in existence that can tell you what pieces of a file have been changed (which can lead to a %). Unfortunately, none of them are perfect because of the issues that I described above. You will find that you have false negatives, false positives, etc. This may be 'good enough' for you.
Before you can identify a specific algorithm that will work for you, you will have to decide what the restrictions of your search will be. Otherwise, your scan will be NP-hard, which leads to unreasonable running times (your scanner may run all day just to check one file).
I suggest you look into Levenshtein distance and Damerau-Levenshtein distance.
The former tells you how many add/delete operations are needed to turn one string into another; and the latter tells you how many add/delete/replace operations are needed to turn one string into another.
I use these quite a lot when writing programs where users can search for things, but they may not know the exact spelling.
There are code examples on both articles.

Generate serial number using letters and digits

I'm developing an application for taking orders in C# and DevExpress, and I need a function that generates a unique order number. The order number must contain letters and digits and has a length of 20 ..
I've seen things like Guid.NewGuid() but I don't want it to be totally random, nor to be just an auto increment number ..
Can anyone help? even if it's a script in a different language, I need ideas desperately :)
You can create type of your own .
lets say yyyyMMddWWW-YYY-XXXXXXX where WWW is the store number, YYY the cashier id XXXXXXX is a hexadecimal number ( -> maybe an actual autoincrement number that you turn it into hex ) . This is just an idea . Im afraid you have to decide by the elements of your system how it will be .
edited : also if you can apply a check digit algorithm on it will also help in avoiding mistakes
Two different methods:
Create MD5 or SHA1 hash of current time
Hash of increment number
One thought comes to mind.
Take the DateTime.Now.Ticks convert it to hexadecimal string.
Voila, String.Format("{0:X}", value);
If not long enough , you said you need 20 digits, you can always pad with zeros.
Get the mother board ID
Get the hdd ID
Merge it by any way
Add your secret code
Apply MD5
Apply Base54
Result: the serial code which is linked to the currect client PC :)
My two cents.
If you need ideas then take a look at the Luhn and Luhn mod N algorithms.
While these algorithms are not unique code generators, they may give you some ideas on how to generate codes that can be validated (such that you could validate the code for correctness before sending it off to the database).
Like Oded suggested, Guid is not random (well, not if you have a network card). It's based on time and location coordinates. See Raymond Chens blog post for a detailed explanation.
You are best off using an auto incremented int for order ids. I don't understand why you wouldn't want to use it or failing that a Guid?
I can't think of any way other then an auto id to maintain uniqueness and represent the order of your different orders in your system.

Are there common methods for hashing an input file to a fixed set of values?

Let's say I'm trying to generate a monster for use in a roleplaying game from an arbitrary piece of input data. Think Barcode Battler or a more-recent iPod game whose name escapes me.
It seems to me like the most straightforward way to generate a monster would be to use a hash function on the input data (say, an MP3 file) and use that hash value to pick from some predetermined set of monsters, or use pieces of the hash value to generate statistics for a custom monster.
The question is, are there obvious methods for taking an arbitrary piece of input data and hashing it to one of a fixed set of values? The primary goal of hashing algorithms is, after all, to avoid collisions. Instead, I'm suggesting that we want to guarantee them - that, given a predetermined set of 100 monsters, we want any given MP3 file to map to one of them.
This question isn't bound to a particular language, but I'm working in C#, so that would be my preference for discussion. Thanks!
Hash the file using any hash function of your choice, convert the result into an integer, and take the result modulo 100.
monsterId = hashResult % 100;
Note that if you later decide to add a new monster and change the code to % 101, nearly all hashes will suddenly map to different monsters.
Okay, that's a very nice question. I would say: don't use hash, because this won't be a nice way for the player to predict patterns. From cognitive theory we know that one thing that is interesting in games is that player can learn by trial and error. So if player gives the input of an image of a red dragon and another image of a red dragon with slightly different pixels, he would like to have the same monster appearing, right? If you use hashes that would not be the case.
Instead, I would recommend doing much simpler things. Imagine that your raw piece of input is just a byte[] , it is itself already a list of numbers. Unfortunately it's only a list of numbers from 0 to 255, so if you for example do an average, you can get 1 number from 0 to 255 . That you could map to a number of monsters already, if you need more, you can read pairs of bytes and just compose Int16, that way you will be able to go up to 65536 possible monsters :)
You can use the MD5, SHA1, or SHA2 of a file as a unique finger print for the file. Each hash function will give you a larger, less overlapping fingerprint and each can be obtained by library functions already in the base libraries.
In truth you could probably hash a much smaller portion of the file, for instance the first 1-3MB of the file and still get a fairly unique fingerprint, without the expense of processing a larger file (like an AVI).
Look in the System.Security namespace for the MD5Crypto provider for an example of how to generate a MD5 from a byte sequence.
Edit: If you want to ensure that the hash collides in a relatively short order you can use CRC2, 4, 6, 8, 16, 32 which will collide fairly frequently (especially CRC2 :)) but be the same for the same file. It is easy to generate.

Categories