binary file to string - c#

i'm trying to read a binary file (for example an executable) into a string, then write it back
FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
System.Text.Encoding enc = System.Text.Encoding.ASCII;
string myString = enc.GetString(bin);
fs.Close();
br.Close();
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] rebin = encoding.GetBytes(myString);
FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs2);
bw.Write(rebin);
fs2.Close();
bw.Close();
this does not work (the result has exactly the same size in bytes but can't run)
if i do bw.Write(bin) the result is ok, but i must save it to a string

When you decode the bytes into a string, and re-encodes them back into bytes, you're losing information. ASCII in particular is a very bad choice for this since ASCII will throw out a lot of information on the way, but you risk losing information when encoding and decoding regardless of the type of Encoding you pick, so you're not on the right path.
What you need is one of the BaseXX routines, that encodes binary data to printable characters, typically for storage or transmission over a medium that only allows text (email and usenet comes to mind.)
Ascii85 is one such algorithm, and the page contains links to different implementations. It has a ratio of 4:5 meaning that 4 bytes will be encoded as 5 characters (a 25% increase in size.)
If nothing else, there's already a Base64 encoding routine built into .NET. It has a ratio of 3:4 (a 33% increase in size), here:
Convert.ToBase64String Method
Convert.FromBase64String Method
Here's what your code can look like with these methods:
string myString;
using (FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open))
using (BinaryReader br = new BinaryReader(fs))
{
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
myString = Convert.ToBase64String(bin);
}
byte[] rebin = Convert.FromBase64String(myString);
using (FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create))
using (BinaryWriter bw = new BinaryWriter(fs2))
bw.Write(rebin);

I don't think you can represent all bytes with ASCII in that way. Base64 is an alternative, but with a ratio between bytes and text of 3:4.

Related

read encoding identifier with StreamReader

I am reading a C# book and in the chapter about streams it says:
If you explicitly specify an encoding, StreamWriter will, by default,
write a prefix to the start of the stream to identify the encoding.
This is usually undesirable and you can prevent it by constructing the
encoding as follows:
var encoding = new UTF8Encoding (encoderShouldEmitUTF8Identifier:false, throwOnInvalidBytes:true);
I'd like to actually see how the identifier looks so I came up with this code:
using (FileStream fs = File.Create ("test.txt"))
using (TextWriter writer = new StreamWriter (fs,new UTF8Encoding(true,false)))
{
writer.WriteLine ("Line1");
}
using (FileStream fs = File.OpenRead ("test.txt"))
using (TextReader reader = new StreamReader (fs))
{
for (int b; (b = reader.Read()) > -1;)
Console.WriteLine (b + " " + (char)b); // identifier not printed
}
To my dissatisfaction, no identifier was printed. How do I read the identifier? Am I missing something?
By default, .NET will try very hard to insulate you from encoding errors. If you want to see the byte-order-mark, aka "preamble" or "BOM", you need to be very explicit with the objects to disable the automatic behavior. This means that you need to use an encoding that does not include the preamble, and you need to tell StreamReader to not try to detect the encoding.
Here is a variation of your original code that will display the BOM:
using (MemoryStream stream = new MemoryStream())
{
Encoding encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: true);
using (TextWriter writer = new StreamWriter(stream, encoding, bufferSize: 8192, leaveOpen: true))
{
writer.WriteLine("Line1");
}
stream.Position = 0;
encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (TextReader reader = new StreamReader(stream, encoding, detectEncodingFromByteOrderMarks: false))
{
for (int b; (b = reader.Read()) > -1;)
Console.WriteLine(b + " " + (char)b); // identifier not printed
}
}
Here, encoderShouldEmitUTF8Identifier: true is passed to the encoder used to create the stream, so that the BOM is written when the stream is created, but encoderShouldEmitUTF8Identifier: false is passed to the encoder used to read the stream, so that the BOM will be treated as a normal character when the stream is being read back. The detectEncodingFromByteOrderMarks: false parameter is passed to the StreamReader constructor as well, so that it won't consume the BOM itself.
This produces this output, just like you wanted:
65279 ?
76 L
105 i
110 n
101 e
49 1
13
10
It is worth mentioning that use of the BOM as a form of identifying UTF8 encoding is generally discouraged. The BOM mainly exists so that the two variations of UTF16 can be distinguished (i.e. UTF16LE and UTF16BE, "little endian" and "big endian", respectively). It's been co-opted as a means of identifying UTF8 as well, but really it's better to just know what the encoding is (which is why things like XML and HTML explicitly state the encoding as ASCII in the first part of the file, and MIME's charset property exists). A single character isn't nearly as reliable as other more explicit means.

Decompress file with wrong size

I have a method that decompresses *.gz file:
using (FileStream originalFileStream = new FileStream(gztempfilename, FileMode.Open, FileAccess.Read))
{
using (FileStream decompressedFileStream = new FileStream(outputtempfilename, FileMode.Create, FileAccess.Write))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}
It worked perfectly, but recently I received pack of files with wrong size:
When I open them with 7-zip they have Packed Size ~ 1,600,000 and Size = 7 (it should be ~20,000,000).
So when I extract them using this code I get only a part of the file. But when I extract this file using 7-zip I get full file.
How can I handle this situation in my code?
My guess is that that the other end does a mistake when GZipping the files. It looks like it does not set the ISIZE bytes correctly.
The ISIZE bytes are the last four bytes of a valid GZip file and come after a 32-bit CRC value which in turn comes directly after the compressed data bytes.
7-Zip seems to be robust against such mistakes whereas the GZipStream is not. It is odd however that 7-Zip is not showing you any errors. It should show you (tested with 7-Zip 16.02 x64/Win7)...
CRC error in case the size is simply wrong,
"Unexpected end of data" in case some or all of the ISIZE bytes are cut off,
"There are some data after end of the payload data" in case there is more data following the ISIZE bytes.
7-Zip always uses the last four bytes of the packed file to determine the size of the original unpacked file without checking if the file is valid and whether the bytes read for that are actually the ISIZE bytes.
You can verify this by checking those last four bytes of the GZipped file with a hex viewer. For your example they should be exactly 07 00 00 00.
If you know the exact size of the unpacked original file you could replace those bytes so that they specify the correct size. For instance, if the unpacked file's size is 20,000,078, which is 01312D4E in hex (0-padded to eight digits), those bytes should be 4E 2D 31 01.
In case you don't know the exact size you can try replacing them with the maximum value, i.e. FF FF FF FF.
After that try your unpack code again.
This is obviously only a hacky solution to your problem. Better try fixing the code that GZips the files you receive or try to find a library that is more robust than GZipStream.
I've used ICSharpCode.SharpZipLib.GZip.GZipInputStream from this library instead of System.IO.Compression.GZipStream and it helped.
Did you try this for check the size? ie:
byte[] bArray;
using (FileStream f = new FileStream(tempFile, FileMode.Open))
{
bArray= new byte[f.Length];
f.Read(b, 0, f.Length);
}
Regards
try:
GZipStream uncompressed = new GZipStream(streamIn, CompressionMode.Decompress, true);
FileStream streamOut = new FileStream(tempDoc[0], FileMode.Create, FileAccess.Write, FileShare.None);
Looks like this is some sort of bug in GZipStream (it does not write original file length into gz end of file).
You need to change the way you compress your files using GZipStream.
The way it will work:
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
And this way will cause the error you have (no matter will you use Flush() or not):
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
}
You might need to call decompressedStream.Seek() after closing the gZip stream.
As shown here.

Using SharpAESCrypt to encrypt strings

I decided to use the SharpAESCrypt implementation of AES encryption in C#. According to their documentation (https://www.aescrypt.com/sharp_aes_crypt.html) you should be able to use a static method, providing a password string, plain-text input stream and a output stream. The data I get out of my output stream appears to be all zero's.
I suspect I am doing something wrong converting strings to a stream and back. Can anyone see anything that is obviously wrong with the code below? (It compiles and runs, but the newByteArray is filled with 0's at the end)
private void encryptMemStream()
{
byte[] byteArray = Encoding.UTF8.GetBytes(DecryptedTB.Text);
using (MemoryStream plainText = new MemoryStream(byteArray))
{
using (MemoryStream encryptedData = new MemoryStream())
{
//plainText.Write(byteArray, 0, byteArray.Length);
SharpAESCrypt.Encrypt(PasswordTB.Text, plainText, encryptedData);
encryptedData.Position = 0;
byte[] newByteArray = new byte[encryptedData.Length];
newByteArray = encryptedData.ToArray();
EncryptedTB.Text = Encoding.UTF8.GetString(newByteArray);
}
}
}
Edit: The native implementation of this code uses fileStreams, but it should work with MemoryStreams too. I will test filestreams and add my results here.
More Edits:
So when you use a file stream you call this code
//Code from the AES Crypt Library
public static void Encrypt(string password, string inputfile, string outputfile)
{
using (FileStream infs = File.OpenRead(inputfile))
using (FileStream outfs = File.Create(outputfile))
Encrypt(password, infs, outfs);
}
Which calls a function which I have been calling directly
//Code from the AES Crypt Library
public static void Encrypt(string password, Stream input, Stream output)
{
int a;
byte[] buffer = new byte[1024 * 4];
SharpAESCrypt c = new SharpAESCrypt(password, output, OperationMode.Encrypt);
while ((a = input.Read(buffer, 0, buffer.Length)) != 0)
c.Write(buffer, 0, a);
c.FlushFinalBlock();
}
There is obviously some subtle difference in using a MemoryStream and a FileStream which I don't understand. FileStream works fine, where as MemoryStream returns a blank array...
A quick code review reveals a few things including what, from discussion, seems to be the main problem you are facing. The problematic line is
EncryptedTB.Text = Encoding.UTF8.GetString(newByteArray);
This line is taking the binary data in newByteArray and telling your code that it is a valid UTF8 byte array. Chances are that it is almost certainly not actually going to be valid UTF8. Parts of it may be (such as the header) but the actually encrypted data is not going to be and so you shouldn't use this method to get a string.
If you need a printable string to put in a textbox of some sort then the correct way to do this for binary data is through base64 encoding.
EncryptedTB.Text = Convert.ToBase64String(newByteArray);
Base64 encoding effectively takes groups of three bytes (24 bits) and converts them to groups of four characters.
A further note while I'm here is that in these two lines:
byte[] newByteArray = new byte[encryptedData.Length];
newByteArray = encryptedData.ToArray();
You declare a byte array and allocate a new array to it. You then immediately discard this in favour of the array from encryptedData.ToArray(). This does not put data into the array allocated by new byte[encryptedData.Length] but creates a new array and puts it into the newByteArray variable. Better would be to just do:
byte[] newByteArray = encryptedData.ToArray();
Full working code below:
byte[] byteArray = Encoding.UTF8.GetBytes(sourceText);
Byte[] newByteArray;
using (MemoryStream plainText = new MemoryStream(byteArray))
{
using (MemoryStream encryptedData = new MemoryStream())
{
SharpAESCrypt.Encrypt(password, plainText, encryptedData);
newByteArray = encryptedData.ToArray();
}
}
EncryptedTB.Text = Convert.ToBase64String(newByteArray);

Decode large Base64 strings

I have an input string from a WebService in the form of a roughly 70 MB large base64-encoded string.
I want to decode this into a file, and tried the obvious: using Convert.FromBase64String().
This, however, yields a OutOfMemoryException. After some reading, I discovered that the Convert methods concerned with Base64
leak memory (no doubt due to the immutable nature of strings and some poor design inside the framework methods)
source
and there is a handy "streamed" replacement in the System.Security.Cryptography namespace: FromBase64Transform.
So, I decided to give that a try, but I need to input the method an array of bytes, which I don't have - I have a string.
How can I convert the string I have into bytes without running into another OutOfMemoryException on that transformation again?
Although you probably could turn your string into a byte array in memory without worrying about memory usage, here's how you can stream the transformation:
var input = "abcdefghijklmnop";
byte[] output;
using (var ms = new MemoryStream())
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
output = ms.ToArray();
}
If you replace the MemoryStream with a suitable FileStream you can stream directly to file rather than an array:
var input = new string('a', 400000000);
byte[] output;
using (var ms = new FileStream(Guid.NewGuid().ToString() + ".bin", FileMode.Create))
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
}
You should use Encoding.ASCII.GetBytes() or similar to convert your string back to the original ASCII which was used to transmit the base64-encoded data.
I am curious about how you received the string from the WebService in the first place. Is it possible that you can skip the conversion to a .NET string and just pass the bytes received directly to the transform? That would be more efficient.

Zip file with utf-8 file names

In my website i have option to download all images uploaded by users. The problem is in images with hebrew names (i need original name of file). I tried to decode file names but this is not helping. Here is a code :
using ICSharpCode.SharpZipLib.Zip;
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(file.Name);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string name = iso.GetString(isoBytes);
var entry = new ZipEntry(name + ".jpg");
zipStream.PutNextEntry(entry);
using (var reader = new System.IO.FileStream(file.Name, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
byte[] buffer = new byte[ChunkSize];
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] actual = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
zipStream.Write(actual, 0, actual.Length);
}
}
After utf-8 encoding i get hebrew file names like this : ??????.jpg
Where is my mistake?
Unicode (UTF-8 is one of the binary encoding) can represent more characters than the other 8-bit encoding. Moreover, you are not doing a proper conversion but a re-interpretation, which means that you get garbage for your filenames. You should really read the article from Joel on Unicode.
...
Now that you've read the article, you should know that in C# string can store unicode data, so you probably don't need to do any conversion of file.Name and can pass this directly to ZipEntry constructor if the library does not contains encoding handling bugs (this is always possible).
Try using
ZipStrings.UseUnicode = true;
It should be a part of the ICSharpCode.SharpZipLib.Zip namespace.
After that you can use something like
var newZipEntry = new ZipEntry($"My ünicödë string.pdf");
and add the entry as normal to the stream. You shouldn't need to do any conversion of the string before that in C#.
You are doing wrong conversion, since strings in C# are already unicode.
What tools do you use to check file names in archive?
By default Windows ZIP implementations use system DOS encoding for file names, while other implementations can use other encoding.

Categories