Decode large Base64 strings - c#

I have an input string from a WebService in the form of a roughly 70 MB large base64-encoded string.
I want to decode this into a file, and tried the obvious: using Convert.FromBase64String().
This, however, yields a OutOfMemoryException. After some reading, I discovered that the Convert methods concerned with Base64
leak memory (no doubt due to the immutable nature of strings and some poor design inside the framework methods)
source
and there is a handy "streamed" replacement in the System.Security.Cryptography namespace: FromBase64Transform.
So, I decided to give that a try, but I need to input the method an array of bytes, which I don't have - I have a string.
How can I convert the string I have into bytes without running into another OutOfMemoryException on that transformation again?

Although you probably could turn your string into a byte array in memory without worrying about memory usage, here's how you can stream the transformation:
var input = "abcdefghijklmnop";
byte[] output;
using (var ms = new MemoryStream())
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
output = ms.ToArray();
}
If you replace the MemoryStream with a suitable FileStream you can stream directly to file rather than an array:
var input = new string('a', 400000000);
byte[] output;
using (var ms = new FileStream(Guid.NewGuid().ToString() + ".bin", FileMode.Create))
using (var cs = new CryptoStream(ms, new FromBase64Transform(), CryptoStreamMode.Write))
using (var tr = new StreamWriter(cs))
{
tr.Write(input);
tr.Flush();
}

You should use Encoding.ASCII.GetBytes() or similar to convert your string back to the original ASCII which was used to transmit the base64-encoded data.
I am curious about how you received the string from the WebService in the first place. Is it possible that you can skip the conversion to a .NET string and just pass the bytes received directly to the transform? That would be more efficient.

Related

Retrieving only half the Image when converting to Base64

I have an API which reads an uploaded Image and changes it to a byte[], however in the database the field for where I have to save the image is a string instead of varbinary(MAX), and I cannot change the field type of the database.
I thought about converting the image to base64 and then storing it but this might cause unnecessary strain on the database.
I have found online the following way but this method can be inconsistent based on the server as the encoding might change:
var str = System.Text.Encoding.Default.GetString(result);
And if I were to use the above method I would need to know what type of encoding does ReadBytes use.
Below is my code:
byte[] fileData = null;
using (var binaryReader = new BinaryReader(image.InputStream))
{
binaryReader.BaseStream.Position = 0;
fileData = binaryReader.ReadBytes(image.ContentLength);
}
Furthermore, when I converted the image to a base64 and viewed it, only half the image was visible:
var base64String = Convert.ToBase64String(fileData);
use Convert.toBase64String() method or just using MemoryStream class->
// 1.toBase64String()
string str = Convert.ToBase64String(bytes);
// 2.MemoryStream class
using (MemoryStream stream = new MemoryStream(bytes))
using (StreamReader streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
Initialize the BinaryReader with this overload that specifies the encoding.
var binaryReader = new BinaryReader(System.IO.Stream input, System.Text.Encoding encoding);

Using SharpAESCrypt to encrypt strings

I decided to use the SharpAESCrypt implementation of AES encryption in C#. According to their documentation (https://www.aescrypt.com/sharp_aes_crypt.html) you should be able to use a static method, providing a password string, plain-text input stream and a output stream. The data I get out of my output stream appears to be all zero's.
I suspect I am doing something wrong converting strings to a stream and back. Can anyone see anything that is obviously wrong with the code below? (It compiles and runs, but the newByteArray is filled with 0's at the end)
private void encryptMemStream()
{
byte[] byteArray = Encoding.UTF8.GetBytes(DecryptedTB.Text);
using (MemoryStream plainText = new MemoryStream(byteArray))
{
using (MemoryStream encryptedData = new MemoryStream())
{
//plainText.Write(byteArray, 0, byteArray.Length);
SharpAESCrypt.Encrypt(PasswordTB.Text, plainText, encryptedData);
encryptedData.Position = 0;
byte[] newByteArray = new byte[encryptedData.Length];
newByteArray = encryptedData.ToArray();
EncryptedTB.Text = Encoding.UTF8.GetString(newByteArray);
}
}
}
Edit: The native implementation of this code uses fileStreams, but it should work with MemoryStreams too. I will test filestreams and add my results here.
More Edits:
So when you use a file stream you call this code
//Code from the AES Crypt Library
public static void Encrypt(string password, string inputfile, string outputfile)
{
using (FileStream infs = File.OpenRead(inputfile))
using (FileStream outfs = File.Create(outputfile))
Encrypt(password, infs, outfs);
}
Which calls a function which I have been calling directly
//Code from the AES Crypt Library
public static void Encrypt(string password, Stream input, Stream output)
{
int a;
byte[] buffer = new byte[1024 * 4];
SharpAESCrypt c = new SharpAESCrypt(password, output, OperationMode.Encrypt);
while ((a = input.Read(buffer, 0, buffer.Length)) != 0)
c.Write(buffer, 0, a);
c.FlushFinalBlock();
}
There is obviously some subtle difference in using a MemoryStream and a FileStream which I don't understand. FileStream works fine, where as MemoryStream returns a blank array...
A quick code review reveals a few things including what, from discussion, seems to be the main problem you are facing. The problematic line is
EncryptedTB.Text = Encoding.UTF8.GetString(newByteArray);
This line is taking the binary data in newByteArray and telling your code that it is a valid UTF8 byte array. Chances are that it is almost certainly not actually going to be valid UTF8. Parts of it may be (such as the header) but the actually encrypted data is not going to be and so you shouldn't use this method to get a string.
If you need a printable string to put in a textbox of some sort then the correct way to do this for binary data is through base64 encoding.
EncryptedTB.Text = Convert.ToBase64String(newByteArray);
Base64 encoding effectively takes groups of three bytes (24 bits) and converts them to groups of four characters.
A further note while I'm here is that in these two lines:
byte[] newByteArray = new byte[encryptedData.Length];
newByteArray = encryptedData.ToArray();
You declare a byte array and allocate a new array to it. You then immediately discard this in favour of the array from encryptedData.ToArray(). This does not put data into the array allocated by new byte[encryptedData.Length] but creates a new array and puts it into the newByteArray variable. Better would be to just do:
byte[] newByteArray = encryptedData.ToArray();
Full working code below:
byte[] byteArray = Encoding.UTF8.GetBytes(sourceText);
Byte[] newByteArray;
using (MemoryStream plainText = new MemoryStream(byteArray))
{
using (MemoryStream encryptedData = new MemoryStream())
{
SharpAESCrypt.Encrypt(password, plainText, encryptedData);
newByteArray = encryptedData.ToArray();
}
}
EncryptedTB.Text = Convert.ToBase64String(newByteArray);

ByteArray class for C#

I'm trying to translate a function from ActionScript 3 into C# .NET.
What I have trouble is how to properly use ByteArrays in C#. In As3 there is a specific Class for it that already has most of the functionality i need, but in C# nothing of that sort seems to exist and I can't wrap my head around it.
This is the As3 function:
private function createBlock(type:uint, tag:uint,data:ByteArray):ByteArray
{
var ba:ByteArray = new ByteArray();
ba.endian = Endian.LITTLE_ENDIAN;
ba.writeUnsignedInt(data.length+16);
ba.writeUnsignedInt(0x00);
ba.writeUnsignedInt(type);
ba.writeUnsignedInt(tag);
data.position = 0;
ba.writeBytes(data);
ba.position = 0;
return ba;
}
But from what I gather, in C# I have to use a normal Array with the byte type, like this
byte[] ba = new byte[length];
Now, I looked into the Encoding Class, the BinaryWriter and BinaryFormatter class and researched if somebody made a Class for ByteArrays, but with no luck.
Can somebody nudge me in the right direction please?
You should be able to do this using a combination of MemoryStream and BinaryWriter:
public static byte[] CreateBlock(uint type, uint tag, byte[] data)
{
using (var memory = new MemoryStream())
{
// We want 'BinaryWriter' to leave 'memory' open, so we need to specify false for the third
// constructor parameter. That means we need to also specify the second parameter, the encoding.
// The default encoding is UTF8, so we specify that here.
var defaultEncoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier:false, throwOnInvalidBytes:true);
using (var writer = new BinaryWriter(memory, defaultEncoding, leaveOpen:true))
{
// There is no Endian - things are always little-endian.
writer.Write((uint)data.Length+16);
writer.Write((uint)0x00);
writer.Write(type);
writer.Write(data);
}
// Note that we must close or flush 'writer' before accessing 'memory', otherwise the bytes written
// to it may not have been transferred to 'memory'.
return memory.ToArray();
}
}
However, note that BinaryWriter always uses little-endian format. If you need to control this, you can use Jon Skeet's EndianBinaryWriter instead.
As an alternative to this approach, you could pass streams around instead of byte arrays (probably using a MemoryStream for implementation), but then you will need to be careful about lifetime management, i.e. who will close/dispose the stream when it's done with? (You might be able to get away with not bothering to close/dispose a memory stream since it uses no unmanaged resources, but that's not entirely satisfactory IMO.)
You want to have a byte stream and then extract the array from it:
using(MemoryStream memory = new MemoryStream())
using(BinaryWriter writer = new BinaryWriter(memory))
{
// write into stream
writer.Write((byte)0); // a byte
writer.Write(0f); // a float
writer.Write("hello"); // a string
return memory.ToArray(); // returns the underlying array
}

binary file to string

i'm trying to read a binary file (for example an executable) into a string, then write it back
FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
System.Text.Encoding enc = System.Text.Encoding.ASCII;
string myString = enc.GetString(bin);
fs.Close();
br.Close();
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] rebin = encoding.GetBytes(myString);
FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs2);
bw.Write(rebin);
fs2.Close();
bw.Close();
this does not work (the result has exactly the same size in bytes but can't run)
if i do bw.Write(bin) the result is ok, but i must save it to a string
When you decode the bytes into a string, and re-encodes them back into bytes, you're losing information. ASCII in particular is a very bad choice for this since ASCII will throw out a lot of information on the way, but you risk losing information when encoding and decoding regardless of the type of Encoding you pick, so you're not on the right path.
What you need is one of the BaseXX routines, that encodes binary data to printable characters, typically for storage or transmission over a medium that only allows text (email and usenet comes to mind.)
Ascii85 is one such algorithm, and the page contains links to different implementations. It has a ratio of 4:5 meaning that 4 bytes will be encoded as 5 characters (a 25% increase in size.)
If nothing else, there's already a Base64 encoding routine built into .NET. It has a ratio of 3:4 (a 33% increase in size), here:
Convert.ToBase64String Method
Convert.FromBase64String Method
Here's what your code can look like with these methods:
string myString;
using (FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open))
using (BinaryReader br = new BinaryReader(fs))
{
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
myString = Convert.ToBase64String(bin);
}
byte[] rebin = Convert.FromBase64String(myString);
using (FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create))
using (BinaryWriter bw = new BinaryWriter(fs2))
bw.Write(rebin);
I don't think you can represent all bytes with ASCII in that way. Base64 is an alternative, but with a ratio between bytes and text of 3:4.

Modifying XMP data with C#

I'm using C# in ASP.NET version 2. I'm trying to open an image file, read (and change) the XMP header, and close it back up again. I can't upgrade ASP, so WIC is out, and I just can't figure out how to get this working.
Here's what I have so far:
Bitmap bmp = new Bitmap(Server.MapPath(imageFile));
MemoryStream ms = new MemoryStream();
StreamReader sr = new StreamReader(Server.MapPath(imageFile));
*[stuff with find and replace here]*
byte[] data = ToByteArray(sr.ReadToEnd());
ms = new MemoryStream(data);
originalImage = System.Drawing.Image.FromStream(ms);
Any suggestions?
How about this kinda thing?
byte[] data = File.ReadAllBytes(path);
... find & replace bit here ...
File.WriteAllBytes(path, data);
Also, i really recommend against using System.Bitmap in an asp.net process, as it leaks memory and will crash/randomly fail every now and again (even MS admit this)
Here's the bit from MS about why System.Drawing.Bitmap isn't stable:
http://msdn.microsoft.com/en-us/library/system.drawing.aspx
"Caution:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions."
Part 1 of the XMP spec 2012, page 10 specifically talks about how to edit a file in place without needing to understand the surrounding format (although they do suggest this as a last resort). The embedded XMP packet looks like this:
<?xpacket begin="■" id="W5M0MpCehiHzreSzNTczkc9d"?>
... the serialized XMP as described above: ...
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf= ...>
...
</rdf:RDF>
</x:xmpmeta>
... XML whitespace as padding ...
<?xpacket end="w"?>
In this example, ‘■’ represents the
Unicode “zero width non-breaking space
character” (U+FEFF) used as a
byte-order marker.
The (XMP Spec 2010, Part 3, Page 12) also gives specific byte patterns (UTF-8, UTF16, big/little endian) to look for when scanning the bytes. This would complement Chris' answer about reading the file in as a giant byte stream.
You can use the following functions to read/write the binary data:
public byte[] GetBinaryData(string path, int bufferSize)
{
MemoryStream ms = new MemoryStream();
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read))
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
while((bytesRead = fs.Read(buffer,0,bufferSize))>0)
{
ms.Write(buffer,0,bytesRead);
}
}
return(ms.ToArray());
}
public void SaveBinaryData(string path, byte[] data, int bufferSize)
{
using (FileStream fs = File.Open(path, FileMode.Create, FileAccess.Write))
{
int totalBytesSaved = 0;
while (totalBytesSaved<data.Length)
{
int remainingBytes = Math.Min(bufferSize, data.Length - totalBytesSaved);
fs.Write(data, totalBytesSaved, remainingBytes);
totalBytesSaved += remainingBytes;
}
}
}
However, loading entire images to memory would use quite a bit of RAM. I don't know much about XMP headers, but if possible you should:
Load only the headers in memory
Manipulate the headers in memory
Write the headers to a new file
Copy the remaining data from the original file

Categories