How to compress a random string?

How to compress a random string? - c#

I'm working on a encryptor application that works based on RSA Asymmetric Algorithm.
It generates a key-pair and the user have to keep it.
As key-pairs are long random strings, I want to create a function that let me compress generated long random strings (key-pairs) based on a pattern.
(For example the function get a string that contains 100 characters and return a string that contains 30 characters)
So when the user enter the compressed string I can regenerate the key-pairs based on the pattern I compressed with.
But a person told me that it is impossible to compress random things because they are Random!
What is your idea ?
Is there any way to do this ?
Thanks

It's impossible to compress (nearly any) random data. Learning a bit about information theory, entropy, how compression works, and the pigeonhole principle will make this abundantly clear.
One exception to this rule is if by "random string", you mean, "random data represented in a compressible form, like hexadecimal". In this sort of scenario, you could compress the string or (the better option) simply encode the bytes as base 64 instead to make it shorter. E.g.
// base 16, 50 random bytes (length 100)
be01a140ac0e6f560b1f0e4a9e5ab00ef73397a1fe25c7ea0026b47c213c863f88256a0c2b545463116276583401598a0c36
// base 64, same 50 random bytes (length 68)
vgGhQKwOb1YLHw5KnlqwDvczl6H+JcfqACa0fCE8hj+IJWoMK1RUYxFidlg0AVmKDDY=
You might instead give the user a shorter hash or fingerprint of the value (e.g. the last x bytes). Then by storing the full key and hash somewhere, you could give them the key when they give you the hash. You'd have to have this hash be long enough that security is not compromised. Depending on your application, this might defeat the purpose because the hash would have to be as long as the key, or it might not be a problem.

public static string ZipStr(String str)
{
using (MemoryStream output = new MemoryStream())
{
using (DeflateStream gzip =
new DeflateStream(output, CompressionMode.Compress))
{
using (StreamWriter writer =
new StreamWriter(gzip, System.Text.Encoding.UTF8))
{
writer.Write(str);
}
}
return Convert.ToBase64String(output.ToArray());
}
}
public static string UnZipStr(string base64)
{
byte[] input = Convert.FromBase64String(base64);
using (MemoryStream inputStream = new MemoryStream(input))
{
using (DeflateStream gzip =
new DeflateStream(inputStream, CompressionMode.Decompress))
{
using (StreamReader reader =
new StreamReader(gzip, System.Text.Encoding.UTF8))
{
return reader.ReadToEnd();
}
}
}
}
Take into account that this doesn't have to be shorter at all... depends on the contents of the string.

Try to use gzip compression and see if it helps you

Related

GZipStream makes my text bigger than original

There is a post in here Compress and decompress string in c# for compressing string in c#.
I've implement the same code for myself but the returned text is almost twice as mine :O
I've tried it on a json with size 87 like this:
{"G":"82f88ff5-4143-46ef-86cc-a19910f4a6b5","U":"df39e3c7-ffd3-4829-a9cd-27bfcbd4403a"}
The result is 168
H4sIAAAAAAAEAC2NUQ6DIBQE5yx8l0QFqfQCnqAHqKCXaHr3jsaQ3TyYfcuXwKpeamHi0Bf9YCaSGVW6psLua5QWmifykVbPyCDJ3gube4GHet+tXZZM7Xrj6d7Z3u/W8896dVVpd5rMbCaa3k1k25M88OMPcjDew64AAAA=
I've changed Unicode to ASCII but the result is still too big (128)
H4sIAAAAAAAEAA3KyxGAMAgFwF44y0w+JAEbsAILICSvCcfedc/70EUnaYEq0FiyVJa+wdoj2LNZThDvs9FB918Xqu0ag4H1Vy3GbrG4jImYSyRVp/cDp8EZE1cAAAA=
public static string Compress(this string s)
{
var bytes = Encoding.ASCII.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}

Gzip is not only compression but a complete file format - this means it adds additional structures which usually can be neglected regarding their size.
However if compressing small strings they can blow up the overall gzip stream.
The standard GZIP header for example has 10 bytes and it's footer is 8 bytes long.
Therefore you now take your gzip compressed result in raw format (not the bloated up base64 encoded one) you will see that it has 95 bytes.
Therefore the 18 bytes for header and hooter already make nearly 20% of the output!

Decode large Base64 strings

I have an input string from a WebService in the form of a roughly 70 MB large base64-encoded string.
I want to decode this into a file, and tried the obvious: using Convert.FromBase64String().
This, however, yields a OutOfMemoryException. After some reading, I discovered that the Convert methods concerned with Base64
leak memory (no doubt due to the immutable nature of strings and some poor design inside the framework methods)
source
and there is a handy "streamed" replacement in the System.Security.Cryptography namespace: FromBase64Transform.
So, I decided to give that a try, but I need to input the method an array of bytes, which I don't have - I have a string.
How can I convert the string I have into bytes without running into another OutOfMemoryException on that transformation again?

Although you probably could turn your string into a byte array in memory without worrying about memory usage, here's how you can stream the transformation:
var input = "abcdefghijklmnop";
byte[] output;
using (var ms = new MemoryStream())
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
output = ms.ToArray();
}
If you replace the MemoryStream with a suitable FileStream you can stream directly to file rather than an array:
var input = new string('a', 400000000);
byte[] output;
using (var ms = new FileStream(Guid.NewGuid().ToString() + ".bin", FileMode.Create))
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
}

You should use Encoding.ASCII.GetBytes() or similar to convert your string back to the original ASCII which was used to transmit the base64-encoded data.
I am curious about how you received the string from the WebService in the first place. Is it possible that you can skip the conversion to a .NET string and just pass the bytes received directly to the transform? That would be more efficient.

Compress a string using GZip, the string is not shorter

I used the following code to compress a string, but the string is not shorter. Can you explain why?
private string Compress(string str)
{
try
{
String returnValue;
byte[] buffer = Encoding.ASCII.GetBytes(str);
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
using (StreamReader sReader = new StreamReader(ms, Encoding.ASCII))
{
returnValue = sReader.ReadToEnd();
}
}
}
return returnValue;
}
catch
{
return str;
}
}

Ignoring issues in the code - there are multiple possible scenarios when this can happen.
Simplified explanation of compression algorithm - compression is based on the fact that data you are trying to compress contain redundant values - patterns which can be recognized by the compression algorithm and can be "shortened" by expressing the redundant values more concisely.
Some scenarios when the compressed result can be larger then the input:
1) Input is too short - compression algorithms have some data overhead and considering the short input, it is unable to compress it effectively. So you have some data overhead from the compression mechanism + original data.
2) Input is already compressed - again, compression algorithms have some data overhead and when is the input already compressed - it is unable to compress it effectively.
3) Input is too random - considering the input is generated by some random generator, the compression algorithm is unable to compress it effectively - no patterns can be recognized.

No map for object error when deserializing object

I have the following C# code which is supposed to serialize arbitrary objects to a string, and then of course deserialize it.
public static string Pack(Message _message)
{
BinaryFormatter formatter = new BinaryFormatter();
MemoryStream original = new MemoryStream();
MemoryStream outputStream = new MemoryStream();
formatter.Serialize(original, _message);
original.Seek(0, SeekOrigin.Begin);
DeflateStream deflateStream = new DeflateStream(outputStream, CompressionMode.Compress);
original.CopyTo(deflateStream);
byte[] bytearray = outputStream.ToArray();
UTF8Encoding encoder = new UTF8Encoding();
string packed = encoder.GetString(bytearray);
return packed;
}
public static Message Unpack(string _packed_message)
{
UTF8Encoding encoder = new UTF8Encoding();
byte[] bytearray = encoder.GetBytes(_packed_message);
BinaryFormatter formatter = new BinaryFormatter();
MemoryStream input = new MemoryStream(bytearray);
MemoryStream decompressed = new MemoryStream();
DeflateStream deflateStream = new DeflateStream(input, CompressionMode.Decompress);
deflateStream.CopyTo(decompressed); // EXCEPTION
decompressed.Seek(0, SeekOrigin.Begin);
var message = (Message)formatter.Deserialize(decompressed); // EXCEPTION 2
return message;
}
But the problem is that any time the code is ran, I am experiencing an exception. Using the above code and invoking it as shown below, I am receiving InvalidDataException: Unknown block type. Stream might be corrupted. at the marked // EXCEPTION line.
After searching for this issue I have attempted to ditch the deflation. This was only a small change: in Pack, bytearray gets created from original.ToArray() and in Unpack, I Seek() input instead of decompressed and use Deserialize(input) instead of decompressed too. The only result which changed: the exception position and body is different, yet it still happens. I receive a SerializationException: No map for object '201326592'. at // EXCEPTION 2.
I don't seem to see what is the problem. Maybe it is the whole serialization idea... the problem is that somehow managing to pack the Message instances is necessary because these objects hold the information that travel between the server and the client application. (Serialization logic is in a .Shared DLL project which is referenced on both ends, however, right now, I'm only developing the server-side first.) It also has to be told, that I am only using string outputs because right now, the TCP connection between the servers and clients are based on string read-write on the ends. So somehow it has to be brought down to the level of strings.
This is how the Message object looks like:
[Serializable]
public class Message
{
public MessageType type;
public Client from;
public Client to;
public string content;
}
(Client right now is an empty class only having the Serializable attribute, no properties or methods.)
This is how the pack-unpack gets invoked (from Main()...):
Shared.Message msg = Shared.MessageFactory.Build(Shared.MessageType.DEFAULT, new Shared.Client(), new Shared.Client(), "foobar");
string message1 = Shared.MessageFactory.Pack(msg);
Console.WriteLine(message1);
Shared.Message mess2 = Shared.MessageFactory.Unpack(message1); // Step into... here be exceptions
Console.Write(mess2.content);
Here is an image showing what happens in the IDE. The output in the console window is the value of message1.
Some investigation unfortunately also revealed that the problem could lie around the bytearray variable. When running Pack(), after the encoder creates the string, the array contains 152 values, however, after it gets decoded in Unpack(), the array has 160 values instead.
I am appreciating any help as I am really out of ideas and having this problem the progress is crippled. Thank you.
(Update) The final solution:
I would like to thank everyone answering and commenting, as I have reached the solution. Thank you.
Marc Gravell was right, I missed the closing of deflateStream and because of this, the result was either empty or corrupted. I have taken my time and rethought and rewrote the methods and now it works flawlessly. And even the purpose of sending these bytes over the networked stream is working too.
Also, as Eric J. suggested, I have switched to using ASCIIEnconding for the change between string and byte[] when the data is flowing in the Stream.
The fixed code lies below:
public static string Pack(Message _message)
{
using (MemoryStream input = new MemoryStream())
{
BinaryFormatter bformatter = new BinaryFormatter();
bformatter.Serialize(input, _message);
input.Seek(0, SeekOrigin.Begin);
using (MemoryStream output = new MemoryStream())
using (DeflateStream deflateStream = new DeflateStream(output, CompressionMode.Compress))
{
input.CopyTo(deflateStream);
deflateStream.Close();
return Convert.ToBase64String(output.ToArray());
}
}
}
public static Message Unpack(string _packed)
{
using (MemoryStream input = new MemoryStream(Convert.FromBase64String(_packed)))
using (DeflateStream deflateStream = new DeflateStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
deflateStream.CopyTo(output);
deflateStream.Close();
output.Seek(0, SeekOrigin.Begin);
BinaryFormatter bformatter = new BinaryFormatter();
Message message = (Message)bformatter.Deserialize(output);
return message;
}
}
Now everything happens just right, as the screenshot proves below. This was the expected output from the first place. The Server and Client executables communicate with each other and the message travels... and it gets serialized and unserialized properly.

In addition to the existing observations about Encoding vs base-64, note you haven't closed the deflate stream. This is important because compression-streams buffer: if you don't close, it may not write the end. For a short stream, that may mean it writes nothing at all.
using(DeflateStream deflateStream = new DeflateStream(
outputStream, CompressionMode.Compress))
{
original.CopyTo(deflateStream);
}
return Convert.ToBase64String(outputStream.GetBuffer(), 0,
(int)outputStream.Length);

Your problem is most probably in the UTF8 encoding. Your bytes are not really a character string and UTF-8 is a encoding with different byte lengths for characters.
This means the byte array may not correspond to a correctly encoded UTF-8 string (there may be some bytes missing at the end for instance.)
Try using UTF16 or ASCII which are constant length encodings (the resulting string will likely contain control characters so it won't be printable or transmitable through something like HTTP or email.)
But if you want to encode as a string it is customary to use UUEncoding to convert the byte array into a real printable string, then you can use any encoding you want.

When I run the following Main() code against your Pack() and Unpack():
static void Main(string[] args)
{
Message msg = new Message() { content = "The quick brown fox" };
string message1 = Pack(msg);
Console.WriteLine(message1);
Message mess2 = Unpack(message1); // Step into... here be exceptions
Console.Write(mess2.content);
}
I see that the bytearray
byte[] bytearray = outputStream.ToArray();
is empty.
I did modify your serialized class slightly since you did not post code for the included classes
public enum MessageType
{
DEFAULT = 0
}
[Serializable]
public class Message
{
public MessageType type;
public string from;
public string to;
public string content;
}
I suggest the following steps to resolve this:
Check the intermediate results along the way. Do you also see 0 bytes in the array? What is the string value returned by Pack()?
Dispose of your streams once you are done with them. The easiest way to do that is with the using keyword.
Edit
As Eli and Marc correctly pointed out, you cannot store arbitrary bytes in a UTF8 string. The mapping is not bijective (you can't go back and forth without loss/distortion of information). You will need a mapping that is bijective, such as the Convert.ToBase64String() approach Marc suggests.

Signing a byte array of 128 bytes with RSA in C sharp

I am completely new to cryptography and I need to sign a byte array of 128 bytes with an RSA key i have generated with C sharp. The key must be 1024 bits.
I have found a few examples of how to use RSA with C sharp and the code I'm currently trying to use is:
public static void AssignParameter()
{
const int PROVIDER_RSA_FULL = 1;
const string CONTAINER_NAME = "SpiderContainer";
CspParameters cspParams;
cspParams = new CspParameters(PROVIDER_RSA_FULL);
cspParams.KeyContainerName = CONTAINER_NAME;
cspParams.Flags = CspProviderFlags.UseMachineKeyStore;
cspParams.ProviderName = "Microsoft Strong Cryptographic Provider";
rsa = new RSACryptoServiceProvider(cspParams);
rsa.KeySize = 1024;
}
public static string EncryptData(string data2Encrypt)
{
AssignParameter();
StreamReader reader = new StreamReader(path + "publickey.xml");
string publicOnlyKeyXML = reader.ReadToEnd();
rsa.FromXmlString(publicOnlyKeyXML);
reader.Close();
//read plaintext, encrypt it to ciphertext
byte[] plainbytes = System.Text.Encoding.UTF8.GetBytes(data2Encrypt);
byte[] cipherbytes = rsa.Encrypt(plainbytes, false);
return Convert.ToBase64String(cipherbytes);
}
This code works fine with small strings (and thus short byte arrays) but when I try this with a string of 128 characters I get an error saying:
CryptographicException was unhandled: Wrong length
(OK, it might not precisely say 'Wrong length', I get the error in danish, and that is 'Forkert længde' which directly translates to 'Wrong length').
Can anyone tell me how I can encrypt a byte array of 128 bytes with a RSA key of 1024 bits in C sharp?
Thanks in advance,
LordJesus
EDIT:
Ok, just to clarify things a bit: I have a message, from which i make a hash using SHA-256. This gives a 32 byte array. This array is padded using a custom padding, so it ends up being a 128 byte array. This padded hash should then be signed with my private key, so the receiver can use my public key to verify that the message received is the same as the message sent. Can this be done with a key of 1024 bits?

If you want to sign you do not want to encrypt. Signatures and encryption are distinct algorithms. It does not help that there is a well-known signature algorithm called RSA, and a well-known asymmetric encryption algorithm also called RSA, and that the signature algorithm was first presented (and still is in many places) as "you encrypt with the private key". This is just plain confusing.
In RSA encryption, the data to encrypt (with the public key) must be padded with what PKCS#1 (the RSA standard) describes as "Type 2 padding", and the result (which has the same length than the modulus) is then processed through the modular exponentiation which is at the core of RSA (at the core, but RSA is not only a modular exponentiation; the padding is very important for security).
When signing, the data to sign must be hashed, then the hash value is embedded in a structure which describes the hash function which was just used, and the encoded structure is itself padded with a "Type 1 padding" -- not the same padding than the padding for encryption, and that's important, too.
Either way, a normal RSA engine will perform the type 1 or type 2 padding itself, and most RSA signature engines will also handle themselves the structure which identifies the used hash function. A RSA signature engine such as RSACryptoServiceProvider can work either with SignHash(), which expects the hash value (the 32 bytes obtained from SHA-256, without any kind of encapsulating structure or type 1 padding -- RSACryptoServiceProvider handles that itself), or SignData(), which expects the data to be signed (the engine then does the hash computation too).
To sum up, if you do any kind of padding yourself, then you are doing it wrong. If you used Encrypt() to compute a signature, then you are doing it wrong, too.

The minimum key size for encrypting 128 bytes would be 1112 bits, when you are calling Encrypt with OAEP off. Note that setting the key size like this rsa.KeySize = 1024 won't help, you need to actually generate they key of the right size and use them.
This is what worked for me:
using System;
using System.IO;
using System.Security.Cryptography;
namespace SO6299460
{
class Program
{
static void Main()
{
GenerateKey();
string data2Encrypt = string.Empty.PadLeft(128,'$');
string encrypted = EncryptData(data2Encrypt);
string decrypted = DecryptData(encrypted);
Console.WriteLine(data2Encrypt);
Console.WriteLine(encrypted);
Console.WriteLine(decrypted);
}
private const string path = #"c:\";
public static void GenerateKey()
{
RSACryptoServiceProvider rsa = new RSACryptoServiceProvider(1112);
string publickKey = rsa.ToXmlString(false);
string privateKey = rsa.ToXmlString(true);
WriteStringToFile(publickKey, path + "publickey.xml");
WriteStringToFile(privateKey, path + "privatekey.xml");
}
public static void WriteStringToFile(string value, string filename)
{
using (FileStream stream = File.Open(filename, FileMode.Create, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(stream))
{
writer.Write(value);
writer.Flush();
stream.Flush();
}
}
public static string EncryptData(string data2Encrypt)
{
RSACryptoServiceProvider rsa = new RSACryptoServiceProvider();
StreamReader reader = new StreamReader(path + "publickey.xml");
string publicOnlyKeyXML = reader.ReadToEnd();
rsa.FromXmlString(publicOnlyKeyXML);
reader.Close();
//read plaintext, encrypt it to ciphertext
byte[] plainbytes = System.Text.Encoding.UTF8.GetBytes(data2Encrypt);
byte[] cipherbytes = rsa.Encrypt(plainbytes,false);
return Convert.ToBase64String(cipherbytes);
}
public static string DecryptData(string data2Decrypt)
{
RSACryptoServiceProvider rsa = new RSACryptoServiceProvider();
StreamReader reader = new StreamReader(path + "privatekey.xml");
string key = reader.ReadToEnd();
rsa.FromXmlString(key);
reader.Close();
byte[] plainbytes = rsa.Decrypt(Convert.FromBase64String(data2Decrypt), false);
return System.Text.Encoding.UTF8.GetString(plainbytes);
}
}
}
Note however, that I'm not using a crypto container, and thus, I don't need your AssignParameter, but if you need to use it, modifying the code should be easy enough.
If you ever need to encrypt large quantities of data (much larger than 128 bytes) this article has sample code on how to do this.

Apparently, according to this question — how to use RSA to encrypt files (huge data) in C# — RSA can only encrypt data shorter than its key length.
Bizarre. The MSDN docs for`RSACryptoServiceProvider.Encrypt() say that a CryptographicException may be thrown if the length of the rgb parameter is greater than the maximum allowed length.
Well. That seems odd, especially since there doesn't seem to be much in the way of documentation regarding said maximum.
A little further digging, under Remarks has this:
The following table describes the padding supported by different versions
of Microsoft Windows and the maximum length of rgb allowed by the different
combinations of operating systems and padding.
If you are running XP or later and you're using OAEP padding, then the limit is stated to be
Modulus size -2 -2*hLen, where hLen is the size of the hash
No idea what the "size of the hash" might be, since the docs, AFAICS, don't mention "hash" anywhere except in regards to digital signatures.
If you are running Windows 2000 or later with the "high encryption pack" installed (again, no idea how you find that out), then the limit is stated to be
Modulus size - 11. (11 bytes is the minimum padding possible.)
Otherwise (Windows 98, Millenium or Windows 2000 or later without the aforementioned "high encryption pack" then you get "Direct Encryption and OAEP padding not supported", where the limitation is
The maximum size allowed for a symmetric key.
Say...wait a second... RSA is an asymmetric algorithm, right?
Worthless documentation. Sheesh.

See http://msdn.microsoft.com/en-us/library/system.security.cryptography.rsacryptoserviceprovider.encrypt.aspx. The exception thrown is probably "The length of the rgb parameter is greater than the maximum allowed length."

Usually RSA encryption has padding, and since your encrypted data size goes to the key size, there is no space for padding. Try to use longer key or less data size to encrypt.

Do you real need the custom padding? If not you could just use RSACryptoServiceProvider.SignData Method

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.