Encrypt larger string to smaller string using C# like reverse md5 - c#

We are trying to convert text "HELLOWORLDTHISISALARGESTRINGCONTENT" into a smaller text. while doing it using MD5 hash we are getting the 16 byte, since it is a one way encryption we are not able to decrypt it. Is there any other way to convert this large string to smaller and revert back the same data? If so please let us know how to do it
Thanks in advance.

Most compression algorithms won't be able to do much with a sequence that short (or may actually make it bigger) - so no: there isn't much you can do to magically shrink it. Your best bet would probably be just generate a guid, and store the full value keyed against the guid (in a database or whatever), and then use the short value as a one-time usage key, to look up the long value (and then erase the record).

It heavily depends on the input data. In general - the worst case - you can't lessen the size of a string through compression if the input data is not long enough and has a high entropy.
Hashing is the wrong approach as a hashing function tries to map a large input data to a short one, but it does not guarantee (by itself) that you can't find a second set of data to map to the same string.
What you can try to do is to imlement a compression algorithm or a lookback table.
Compression can be done by ziplib or any other compression library (just google for it). The lookback approach requires a second place to store the lookup information. For example, when you get the first input string, you map it to the number 1 and save the information 1 maps to {input data} somewhere else. For every subsequent data set you add another mapping entry. If the input data set is finite, this approach may save you space.

Related

Hash a string for duplicate detection

I'm writing a C# API which stored SWIFT messages types. I need to write a class that takes the entire string message and create a hash of it, store this hash in the database, so that when a new message is processed, it creates another hash, and checks this hash against ones in the database.
I have the following
public static byte[] GetHash(string inputString)
{
HashAlgorithm algorithm = MD5.Create(); // SHA1.Create()
return algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
}
and I need to know, if this will do?
Global Comment*
So, I receive the files in a secure network, so we have full control over their validity - What I need to control is duplicate payments being made. I could split the record down into it's respective tag elemenents (SWFIT terminology) and then check them individually, but this then need to compare against records in the database, and the cost isn't something that can happen.
I need to check if the entire message is a duplicate of a message already processed, which is why i used this approach.
It depends on what you want to do. If you are expecting messages to never be intentionally tampered with, even CRC64 will do just fine.
If you want a .NET provided solution that is fast and provides no cryptographic security, MD5 is just fine and will work for what you need.
If you need to determine if a message is different from another, and you expect someone to tamper with the data in transit and it may potentially be modified with bit twiddling techniques to force a hash collision, you should use SHA-256 or SHA-512.
Collisions shouldn't be a problem unless you are hashing billions of messages or someone is tampering with the data in transit. If someone is tampering with the data in transit, you have bigger problems.
You could implement it the way that Dictionary implements it. The Bucket system.
Have a Hash value in the database, and store the raw data.
----------------
| Hash | Value |
----------------
By searching through the hashes first the query will be faster, and if there are multiple hits, as there at some point will be with MD5, you can just iterate through them, and match them more closely to see if they really are the same.
But as Michael J. Gray says, the probability of a collision is very small, on smaller datasets.

Storing 16 bytes of String array in 4 bytes memory, (compression) in RFID Tags

I hope that this question will not produce some vagueness. Actually I am working on RFID project and I am using Passive Tags. These Tags store only 4 bytes of Data, 32bits. I am trying to store more information in String in Tag's Data Bank. I searched the internet for String compression Algorithms but I didn't find any of them suitable. Someone please guide me through this issue. How can I save more data in this 4 bytes Data Bank, should I use some other strategy for storing, if yes, then what? Moreover, I am using C# on Handheld Window CE device.
I'll appreciate if someone could help me...
It depends on your tag, for example alien tag http://www.alientechnology.com/docs/products/Alien-Technology-Higgs-3-ALN-9662-Short.pdf , has EPC memory , I think you use your EPC memory but You can also use User Memory in your tag. You don't have to compress anything, just use your User Memory. Furthermore, technically I rather not to save many data on my tag, I use my own coding on 32 bit and relates(map) it to the more Data on my Software, and save my data on my Hard Disk. It is more safe too.
There is obviously no compression that can reduce arbitrary 16 byte values to 4 byte values. That's mathematically impossible, check the Pidgeonhole principle for details.
Store the actual data in some kind of database. Have the 4 bytes encode an integer that acts as a key for the row your want to refer to. For example by using an auto-increment primary key, or an index into an array. Works with up to 4 billion rows.
If you have less than 2^32 strings, simply enumerate them and then save the strings index (in your "dictionary") inside your 4 byte "Data Bank".
A compression scheme can't guarantee such high compression ratios.
The only way I can think of with 32-bits is to store an int in the 32-bits, and construct a local/remote URL out of it, which points to the actual data.
You could also make the stored value point to entries in a local look-up table on the device.
Unless you know a lot about the format of your string, it is impossible to do this. This is evident from the pigeonhole principle: you have a theoretical 2^128 different 16-byte strings, but only 2^32 different values to choose from.
In other words, no compression algorithm will guarantee that an arbitrary string in your possible input set will map to a 4-byte value in the output set.
It may be possible to devise an algorithm which will work in your particular case, but unless your data set is sufficiently restricted (at most 1 in 79,228,162,514,264,337,593,543,950,336 possible strings may be valid) and has a meaningful structure, then your only option is to store some mapping externally.

Unique id for a file in C#

I need to generate a unique id for file sizes of upto 200-300MB. The condition is that the algo should be quick, it should not take much time. I am selecting the files from a desktop and calculation a hash value as such:
HMACSHA256 myhmacsha256 = new HMACSHA256(key);
byte[] hashValue = myhmacsha256.ComputeHash(fileStream);
filestream is a handle to the file to read content from it. This method is going to take a lot of time for obvious reasons.
Does windows generate a key for a file for its own book keeping that I could directly use ?
Is there any other way to identify if the file is same, instead of matching file name which is not very foolproof.
MD5.Create().ComputeHash(fileStream);
Alternatively, I'd suggest looking at this rather similar question.
How about generating a hash from the info that's readily available from the file itself? i.e. concatenate :
File Name
File Size
Created Date
Last Modified Date
and create your own?
When you compute hashes and compare them, it would require both files to completely go through. My suggestion is to first check the file sizes, if they are identical and then go through the files byte by byte.
If you want a "quick and dirty" check, I would suggest looking at CRC-32. It is extremely fast (the algorithm simply involves doing XOR with table lookups), and if you aren't too concerned about collision resistance, a combination of the file size and the CRC-32 checksum over the file data should be adequate. 28.5 bits are required to represent the file size (that gets you to 379M bytes), which means you get a checksum value of effectively just over 60 bits. I would use a 64-bit quantity to store the file size, for future proofing, but 32 bits would work too in your scenario.
If collision resistance is a consideration, then you pretty much have to use one of the tried-and-true-yet-unbroken cryptographic hash algorithms. I would still concur with what Devils child wrote and also include the file size as a separate (readily accessible) part of the hash, however; if the sizes don't match, there is no chance that the file content can be the same, so in that case the computationally intensive hash calculation can be skipped.

Shorten String from Byte Array

I have a structure that I am converting to a byte array of length 37, then to a string from that.
I am writing a very basic activation type library, and this string will be passed between people. So I want to shorten it from length 37 to something more manageable to type.
Right now:
Convert the structure to a byte array,
Convert the byte array to a base 64 string (which is still too long).
What is a good way to shorten this string, yet still maintain the data stored in it?
Thanks.
In the general case, going from an arbitrary byte[] to a string requires more data, since we assume we want to avoid non-printable characters. The only way to reduce it is to compress before the base-whatever (you can get a little higher than base-64, but not much - and it certainly isn't any more "friendly") - but compression won't really kick in for such a short size. Basically, you can't do that. You are trying to fit a quart in a pint pot, and that doesn't work.
You may have to rethink your requirements. Perhaps save the BLOB internally, and issue a shorter token (maybe 10 chars, maybe a guid) that is a key to the actual BLOB.
Data compression may be a possiblity to check out, but you can't just compress a 40-byte message to 6 bytes (for example).
If the space of possible strings/types is limited, map them to a list (information coding).
I don't know of anything better than base-64 if you actually have to pass the value around and if users have to type it in.
If you have a central data store they can all access, you could just give them the ID of the row where you saved it. This of course depends on how "secret" this data needs to be.
But I suspect that if you're trying to use this for activation, you need them to have an actual value.
How will the string be passed? Can you expect users to perhaps just copy/paste? Maybe some time spent on clearing up superfluous line breaks that come from an email reader or even your "Copy from here" and "Copy to here" lines might bear more fruit!
Can the characters in your string have non-printable chars? If so, you don't need to base64-encode the bytes, you can simply create the string from them (saved 33%)
string str = new string(byteArray.Cast<char>().ToArray());
Also, are the values in the byte array restricted somehow? If they fall into a certain range (i.e., not all of the 256 possible values), you can consider stuffing two of each in each character of the string.
If you really have 37 bytes of non-redundant information, then you are out of luck. Compression may help in some cases, but if this is an activation key, I would recommend having keys of same length (and compression will not enforce this).
If this code is going to be passed over e-mail, then I see no problem in having an even larger key. Another option might be to insert hyphens every 5-or-so characters, to break it into smaller chunks (e.g. XXXXX-XXXXX-XXXXX-XXXXX-XXXXX).
Use a 160bit hash and hope no collisions? It would be much shorter. If you can use a look-up table, just use a 128 or even 64bit incremental value. Much much shorter than your 37 chars.

Dissolve string bytes into a fixed length formula based pattern by using keys, and even extract those bytes

Suppose there is a string containing 255 characters. And there is a fixed length assume 64-128 bytes a kind of byte pattern. I want to "dissolve" that string with 255 characters, byte by byte into the other fixed length byte pattern. The byte pattern is like a formula based "hash" or something similar into which a formula based algorithm dissolves the bytes into it. Later, when I am required to extract the dissolved bytes from that fixed length pattern, I would use the same algorithm's reverse, or extract function. The algorithm works through special keys or passwords and uses them to dissolve the bytes into the pattern, the same keys are used to extract the bytes in their original value from the pattern. I ask for help from the coders here. Please also guide me with steps so that I be able to understand what steps are to be taken, what to do. I only know VB .NET and C#.
For instance:
I have this three characters: "A", "B", "C"
The formula based fixed length super pattern (works like a whirlpool) is:
AJE83HDL389SB4VS9L3
Now I wish to "dissolve", "submerge" the characters "A", "B", "C", one by one into the above pattern to change it completely. After dissolving the characters, the super pattern changes drastically, just like the hash:
EJS83HDLG89DB2G9L47
I would be able to extract the characters from the last dissolved character to the first by using an extraction algorhythm and the original keys which were used to dissolve the characters into this super pattern. After the extraction of all the characters, the super pattern resets to the original initial state. Each character insert and remove has a unique pattern state.
After extraction of all characters, the super pattern goes back to the original state. This happens upon the removal of the character by the extraction algo:
AJE83HDL389SB4VS9L3
This looks a lot like your previous question(s). The problem with them is that you seem to start asking from a half-baked solution.
So, what do you really want? Input , Output, Constraints?
To encrypt a string, use Encryption (Reijndael). To transform the resulting byte[] data to a string (for transport), use base64.
If you're happy having the 'keys' for the individual bits of data being determined for you, this can be done similarly to a one-time-pad (though it's not one-time!) - generate a random string as your 'base', then xor your data strings with it. Each output is the 'key' to get the original data back, and the 'base' doesn't change. This doesn't result in output data that's any smaller than the input, however (and this is impossible in the general case anyway), if that's what you're going for.
Like your previous question, you're not really being clear about what you want. Why not just ask a question about how to achieve your end goals, and let people provide answers describing how, or tell you why it's not possible.
Here are 2 cases
Lossless compression (exact bytes are decoded from compressed info)
In this case Shannon Entropy
clearly states that there can't be any algorithm which could compress data to rates greater than information entropy predicts.
Loosy compression (some original bytes are lost forever in compression scheme,- such as used in JPG image files (Do you remember setting of 'image quality' ??))
In this type of compression, you however can make better and better compression scheme with penalty that you loose more and more original bytes.
(Down to example of compression to zero bytes, where zero bytes are restored after, but this compression is invented either - magical button DELETE - moves information to black hole (sorry for sarcasm );)

Categories