Bytes read as UTF8 string and converted to Base64

Bytes read as UTF8 string and converted to Base64 - c#

Forgive the lengthy setup here but I thought it may help to have the context...
I am implementing a custom digital signature validation method in as part of a WCF service. We're using a custom method because various differing interpretations of some industry standards but the details there aren't all that relevant.
In this particular scenario, I am receiving an MTOM/XOP encoded request where the root MIME part contains a digital signature and the signature DigestValue and SignatureValue pieces are split up into separate MIME parts.
The MIME parts that contain the signature DigestValue and SignatureValue data is binary encoded so it is literally a bunch of raw bytes in the web request like this:
Content-Id: <c18605af-18ec-4fcb-bec7-e3767ef6fe53#example.jaxws.sun.com>
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
[non-printable-binary-data-goes-here]
--uuid:eda4d7f2-4647-4632-8ecb-5ba44f1a076d
I am reading the contents of the message in as a string (using the default UTF8 encoding) like this (see the requestAsString parameter below):
MessageBuffer buffer = request.CreateBufferedCopy(int.MaxValue);
try
{
using (MemoryStream mstream = new MemoryStream())
{
buffer.WriteMessage(mstream);
mstream.Position = 0;
using (StreamReader sr = new StreamReader(mstream))
{
requestAsString = sr.ReadToEnd();
}
request = buffer.CreateMessage();
}
}
After I read the MTOM/XOP message in, I am attempting to re-organize the multiple MIME parts into one SOAP message where the signature DigestValue and SignatureValue elements are restored to the original SOAP envelope (and not as attachments). So basically I am taking decoding the MTOM/XOP request.
Unfortunately, I am having trouble reading the DigestValue and SignatureValue pieces correctly. I need to read the bytes out of the message and get the base64 string representation of that data.
Despite all the context above, it seems the core problem is reading the binary data in as a string (UTF8 encoded) and then converting it to a proper base64 representation.
Here is what I am seeing in my test code:
This is my example base64 string:
string base64String = "mowXMw68eLSv9J1W7f43MvNgCrc=";
I can then get the byte representation of that string. This yields an array of 20 bytes:
byte[] base64Bytes = Convert.FromBase64String(base64String);
I then get the UTF8 encoded version of those bytes:
string decodedString = UTF8Encoding.UTF8.GetString(base64Bytes);
Now the strange part... if I convert the string back to bytes as follows, I get an array of bytes that is 39 bytes long:
byte[] base64BytesBack = UTF8Encoding.UTF8.GetBytes(decodedString);
So obviously at this point, when I convert back into a base64 string, it doesn't match the original value:
string base64StringBack = Convert.ToBase64String(base64BytesBack);
base64StringBack is set to "77+977+9FzMO77+9eO+/ve+/ve+/vVbvv73vv703Mu+/vWAK77+9"
What am I doing wrong here? If I switch to using UTF8Encoding.Unicode.GetString() and UTF8Encoding.Unicode.GetBytes(), it works as expected:
string base64String = "mowXMw68eLSv9J1W7f43MvNgCrc=";
// First get an array of bytes from the base64 string
byte[] base64Bytes = Convert.FromBase64String(base64String);
// Get the Unicode representation of the base64 bytes.
string decodedString = UTF8Encoding.Unicode.GetString(base64Bytes);
byte[] base64BytesBack = UTF8Encoding.Unicode.GetBytes(decodedString);
string base64StringBack = Convert.ToBase64String(base64BytesBack);
Now base64StringBack is set to "mowXMw68eLSv9J1W7f43MvNgCrc=" so it seems I am mis-using the UTF8 encoding somehow or it is behaving differently than I would expect.

Arbitrary binary data cannot be decoded into an UTF8 encoded string and then encoded back to the same binary data. The paragraph "Invalid byte sequences" in http://en.wikipedia.org/wiki/UTF-8 points that out.
I am a bit confused as to why you want the data encoded/decoded as UTF8.

Ok, I took a different approach to reading the MTOM/XOP message:
Instead of relying on my own code to parse the MIME parts by hand, I just used XmlDictionaryReader.CreateMtomReader() to get an XmlDictionaryReader and read the message into an XmlDocument (being careful to preserve whitespace on the XmlDocument so digital signatures aren't broken):
MessageBuffer buffer = request.CreateBufferedCopy(int.MaxValue);
messageContentType = WebOperationContext.Current.IncomingRequest.ContentType;
try
{
using (MemoryStream mstream = new MemoryStream())
{
buffer.WriteMessage(mstream);
mstream.Position = 0;
if (messageContentType.Contains("multipart/related;"))
{
Encoding[] encodings = new Encoding[1];
encodings[0] = Encoding.UTF8;
// MTOM
using (XmlDictionaryReader reader = XmlDictionaryReader.CreateMtomReader(mstream, encodings, messageContentType, XmlDictionaryReaderQuotas.Max))
{
XmlDocument msgDoc = new XmlDocument();
msgDoc.PreserveWhitespace = true;
msgDoc.Load(reader);
requestAsString = msgDoc.OuterXml;
reader.Close();
}
}
else
{
// Text
using (StreamReader sr = new StreamReader(mstream))
{
requestAsString = sr.ReadToEnd();
}
}
request = buffer.CreateMessage();
}
}
finally
{
buffer.Close();
}

Related

How to remove BOM from an encoded base64 UTF string?

I have a file encoded in base64 using openssl base64 -in en -out en1 in a command line in MacOS and I am reading this file using the following code:
string fileContent = File.ReadAllText(Path.Combine(AppContext.BaseDirectory, MConst.BASE_DIR, "en1"));
var b1 = Convert.FromBase64String(fileContent);
var str1 = System.Text.Encoding.UTF8.GetString(b1);
The string I am getting has a ? before the actual file content. I am not sure what's causing this, any help will be appreciated.
Example Input:
import pandas
import json
Encoded file example:
77u/DQppbXBvcnQgY29ubmVjdG9yX2FwaQ0KaW1wb3J0IGpzb24NCg0K
Output based on the C# code:
?import pandas
import json

Normally, when you read UTF (with BOM) from a text file, the decoding is handled for you behind the scene. For example, both of the following lines will read UTF text correctly regardless of whether or not the text file has a BOM:
File.ReadAllText(path, Encoding.UTF8);
File.ReadAllText(path); // UTF8 is the default.
The problem is that you're dealing with UTF text that has been encoded to a Base64 string. So, ReadAllText() can no longer handle the BOM for you. You can either do it yourself by (checking and) removing the first 3 bytes from the byte array or delegate that job to a StreamReader, which is exactly what ReadAllText() does:
var bytes = Convert.FromBase64String(fileContent);
string finalString = null;
using (var ms = new MemoryStream(bytes))
using (var reader = new StreamReader(ms)) // Or:
// using (var reader = new StreamReader(ms, Encoding.UTF8))
{
finalString = reader.ReadToEnd();
}
// Proceed to using finalString.

Converting a byte[] string back to byte[] array

I have one scenario with class like this.
Class Document
{
public string Name {get;set;}
public byte[] Contents {get;set;}
}
Now I am trying to implement the import export functionality where I keep the document in binary so the document will be in json file with other fields and the document will be something in this format.
UEsDBBQABgAIAAAAIQCitGbRsgEAALEHAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAA==
Now when I upload this file back, I get this file as a string and I get the same data but when I try to convert this in binary bytes[] the file become corrupt.
How can I achieve this ?
I use something like this to convert
var ss = sr.ReadToEnd();
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(ss);
writer.Flush();
stream.Position = 0;
var bytes = default(byte[]);
bytes = stream.ToArray();

This looks like base 64. Use:
System.Convert.ToBase64String(b)
https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx
And
System.Convert.FromBase64String(s)
https://msdn.microsoft.com/en-us/library/system.convert.frombase64string%28v=vs.110%29.aspx

You need to de-code it from base64, like this:
Assuming you've read the file into ss as a string.
var bytes = Convert.FromBase64String(ss);

There are several things going on here. You need to know the encoding for the default StreamWriter, if it is not specified it defaults to UTF-8 encoding. However, .NET strings are always either UNICODE or UTF-16.
MemoryStream from string - confusion about Encoding to use
I would suggest using System.Convert.ToBase64String(someByteArray) and its counterpart System.Convert.FromBase64String(someString) to handle this for you.

Converting a string encoded in utf8 to unicode in C#

I've got this string returned via HTTP Post from a URL in a C# application, that contains some chinese character eg:
GelatosÂ® Colors Gift Setä¸æ–‡
Problem is I want to convert it to
Gelatos® Colors Gift Set中文
Both string are actually identical but encoded differently. I understand in C# everything is UTF16. I've tried reading alof of postings here regarding converting from one encoding to the other but no luck.
Hope someone could help.
Here's the C# code:
WebClient wc = new WebClient();
json = wc.DownloadString("http://mysite.com/ext/export.asp");
textBox2.Text = "Receiving orders....";
//convert the string to UTF16
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] asciiBytes = ascii.GetBytes(json);
byte[] utf8Bytes = utf8.GetBytes(json);
byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
string sOut = unicode.GetString(unicodeBytes);
System.Windows.Forms.MessageBox.Show(sOut); //doesn't work...
Here's the code from the server:
<%#CodePage = 65001%>
<%option explicit%>
<%
Session.CodePage = 65001
Response.charset ="utf-8"
Session.LCID = 1033 'en-US
.....
response.write (strJSON)
%>
The output from the web is correct. But I was just wondering if some changes is done on the http stream to the C# application.
thanks.

Download the web pages as bytes in the first place. Then, convert the bytes to the correct encoding.
By first converting it using a wrong encoding you are probably losing data. Especially using ASCII.

If the server is really returning UTF-8 text, you can configure your WebClient by setting its Encoding property. This would eliminate any need for subsequent conversions.
using (WebClient wc = new WebClient())
{
wc.Encoding = Encoding.UTF8;
json = wc.DownloadString("http://mysite.com/ext/export.asp");
}

No map for object error when deserializing object

I have the following C# code which is supposed to serialize arbitrary objects to a string, and then of course deserialize it.
public static string Pack(Message _message)
{
BinaryFormatter formatter = new BinaryFormatter();
MemoryStream original = new MemoryStream();
MemoryStream outputStream = new MemoryStream();
formatter.Serialize(original, _message);
original.Seek(0, SeekOrigin.Begin);
DeflateStream deflateStream = new DeflateStream(outputStream, CompressionMode.Compress);
original.CopyTo(deflateStream);
byte[] bytearray = outputStream.ToArray();
UTF8Encoding encoder = new UTF8Encoding();
string packed = encoder.GetString(bytearray);
return packed;
}
public static Message Unpack(string _packed_message)
{
UTF8Encoding encoder = new UTF8Encoding();
byte[] bytearray = encoder.GetBytes(_packed_message);
BinaryFormatter formatter = new BinaryFormatter();
MemoryStream input = new MemoryStream(bytearray);
MemoryStream decompressed = new MemoryStream();
DeflateStream deflateStream = new DeflateStream(input, CompressionMode.Decompress);
deflateStream.CopyTo(decompressed); // EXCEPTION
decompressed.Seek(0, SeekOrigin.Begin);
var message = (Message)formatter.Deserialize(decompressed); // EXCEPTION 2
return message;
}
But the problem is that any time the code is ran, I am experiencing an exception. Using the above code and invoking it as shown below, I am receiving InvalidDataException: Unknown block type. Stream might be corrupted. at the marked // EXCEPTION line.
After searching for this issue I have attempted to ditch the deflation. This was only a small change: in Pack, bytearray gets created from original.ToArray() and in Unpack, I Seek() input instead of decompressed and use Deserialize(input) instead of decompressed too. The only result which changed: the exception position and body is different, yet it still happens. I receive a SerializationException: No map for object '201326592'. at // EXCEPTION 2.
I don't seem to see what is the problem. Maybe it is the whole serialization idea... the problem is that somehow managing to pack the Message instances is necessary because these objects hold the information that travel between the server and the client application. (Serialization logic is in a .Shared DLL project which is referenced on both ends, however, right now, I'm only developing the server-side first.) It also has to be told, that I am only using string outputs because right now, the TCP connection between the servers and clients are based on string read-write on the ends. So somehow it has to be brought down to the level of strings.
This is how the Message object looks like:
[Serializable]
public class Message
{
public MessageType type;
public Client from;
public Client to;
public string content;
}
(Client right now is an empty class only having the Serializable attribute, no properties or methods.)
This is how the pack-unpack gets invoked (from Main()...):
Shared.Message msg = Shared.MessageFactory.Build(Shared.MessageType.DEFAULT, new Shared.Client(), new Shared.Client(), "foobar");
string message1 = Shared.MessageFactory.Pack(msg);
Console.WriteLine(message1);
Shared.Message mess2 = Shared.MessageFactory.Unpack(message1); // Step into... here be exceptions
Console.Write(mess2.content);
Here is an image showing what happens in the IDE. The output in the console window is the value of message1.
Some investigation unfortunately also revealed that the problem could lie around the bytearray variable. When running Pack(), after the encoder creates the string, the array contains 152 values, however, after it gets decoded in Unpack(), the array has 160 values instead.
I am appreciating any help as I am really out of ideas and having this problem the progress is crippled. Thank you.
(Update) The final solution:
I would like to thank everyone answering and commenting, as I have reached the solution. Thank you.
Marc Gravell was right, I missed the closing of deflateStream and because of this, the result was either empty or corrupted. I have taken my time and rethought and rewrote the methods and now it works flawlessly. And even the purpose of sending these bytes over the networked stream is working too.
Also, as Eric J. suggested, I have switched to using ASCIIEnconding for the change between string and byte[] when the data is flowing in the Stream.
The fixed code lies below:
public static string Pack(Message _message)
{
using (MemoryStream input = new MemoryStream())
{
BinaryFormatter bformatter = new BinaryFormatter();
bformatter.Serialize(input, _message);
input.Seek(0, SeekOrigin.Begin);
using (MemoryStream output = new MemoryStream())
using (DeflateStream deflateStream = new DeflateStream(output, CompressionMode.Compress))
{
input.CopyTo(deflateStream);
deflateStream.Close();
return Convert.ToBase64String(output.ToArray());
}
}
}
public static Message Unpack(string _packed)
{
using (MemoryStream input = new MemoryStream(Convert.FromBase64String(_packed)))
using (DeflateStream deflateStream = new DeflateStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
deflateStream.CopyTo(output);
deflateStream.Close();
output.Seek(0, SeekOrigin.Begin);
BinaryFormatter bformatter = new BinaryFormatter();
Message message = (Message)bformatter.Deserialize(output);
return message;
}
}
Now everything happens just right, as the screenshot proves below. This was the expected output from the first place. The Server and Client executables communicate with each other and the message travels... and it gets serialized and unserialized properly.

In addition to the existing observations about Encoding vs base-64, note you haven't closed the deflate stream. This is important because compression-streams buffer: if you don't close, it may not write the end. For a short stream, that may mean it writes nothing at all.
using(DeflateStream deflateStream = new DeflateStream(
outputStream, CompressionMode.Compress))
{
original.CopyTo(deflateStream);
}
return Convert.ToBase64String(outputStream.GetBuffer(), 0,
(int)outputStream.Length);

Your problem is most probably in the UTF8 encoding. Your bytes are not really a character string and UTF-8 is a encoding with different byte lengths for characters.
This means the byte array may not correspond to a correctly encoded UTF-8 string (there may be some bytes missing at the end for instance.)
Try using UTF16 or ASCII which are constant length encodings (the resulting string will likely contain control characters so it won't be printable or transmitable through something like HTTP or email.)
But if you want to encode as a string it is customary to use UUEncoding to convert the byte array into a real printable string, then you can use any encoding you want.

When I run the following Main() code against your Pack() and Unpack():
static void Main(string[] args)
{
Message msg = new Message() { content = "The quick brown fox" };
string message1 = Pack(msg);
Console.WriteLine(message1);
Message mess2 = Unpack(message1); // Step into... here be exceptions
Console.Write(mess2.content);
}
I see that the bytearray
byte[] bytearray = outputStream.ToArray();
is empty.
I did modify your serialized class slightly since you did not post code for the included classes
public enum MessageType
{
DEFAULT = 0
}
[Serializable]
public class Message
{
public MessageType type;
public string from;
public string to;
public string content;
}
I suggest the following steps to resolve this:
Check the intermediate results along the way. Do you also see 0 bytes in the array? What is the string value returned by Pack()?
Dispose of your streams once you are done with them. The easiest way to do that is with the using keyword.
Edit
As Eli and Marc correctly pointed out, you cannot store arbitrary bytes in a UTF8 string. The mapping is not bijective (you can't go back and forth without loss/distortion of information). You will need a mapping that is bijective, such as the Convert.ToBase64String() approach Marc suggests.

Decoding base64 file contents between PHP and C#

I need to serve an AES encrypted, base64 encoded file from PHP to a C# client (Mono, on various platforms). I've successfully got the AES encryption/decryption working fine but as soon as I attempt the base64 encoding/decoding I run into trouble. Both the examples below have the AES disabled, so that shouldn't be a factor.
My simplest test case, a Hello World string, works fine:
PHP serving output-
// Save encoded data to file
$data = base64_encode("Hello encryption world!!");
$file = fopen($targetPath, 'w');
fwrite($file, $data);
fclose($file);
// Later on, serve the file
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false);
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachment; filename=".basename($product->PackageFilename($packageId)));
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($targetPath));
ob_clean();
flush();
$handle = fopen($targetPath, "r");
fpassthru($handle);
fclose($handle);
C# decoding and using-
StreamReader reader = new StreamReader(stream);
char[] buffer = DecodeBuffer;
string decoded = "";
int read = 0;
while (0 < (read = reader.Read(buffer, 0, DecodeBufferSize)))
{
byte[] decodedBytes = Convert.FromBase64CharArray(buffer, 0, read);
decoded += System.Text.Encoding.UTF8.GetString(decodedBytes);
}
Log(decoded); // Correctly logs "Hello encryption world!!"
However once I start trying to do the same thing with the contents of a file, a FormatException: Invalid character found is thrown by Convert.FromBase64CharArray:
PHP serving output-
// Save encoded data to file
$data = base64_encode(file_get_contents($targetPath));
$file = fopen($targetPath, 'w');
fwrite($file, $data);
fclose($file);
// Later on, serve the file
// Same as above
C# decoding and using-
using (Stream file = File.Open(zipPath, FileMode.Create))
{
using (StreamReader reader = new StreamReader(stream))
{
char[] buffer = DecodeBuffer;
byte[] decodedBytes;
int read = 0;
while (0 < (read = reader.Read(buffer, 0, DecodeBufferSize)))
{
// Throws FormatException: Invalid character found
decodedBytes = Convert.FromBase64CharArray(buffer, 0, read);
file.Write(decodedBytes, 0, decodedBytes.Length);
}
}
}
Is there some kind of additional processing that should be done on larger data for base64 to be valid? Is it perhaps just not appropriate to be doing this with large binary data-and if so how else would you prevent potential problems with characters unsafe for transmission?

Your reading Base64 text code is not correct.
Base64 is text, so consider using text reader instead
Base64 may contain new lines/white spaces. I.e. it is custom to have split whole Base64 encoded value into 70-80 character long strings.
To verify if data in the file is correct read whole file as string (StreamReader.ReadToEnd) and convert to byte array (Convert.FromBase64String).
If file contains valid Base64 data and you can't read it as single string you should implement your own Base64 decoding or manually read correct number of non white space characters (multiple of 4) and decode such chunks.

Base64 encoding converts 3 octets into 4 encoded characters. Thus, the length of the data you provide for decoding needs to be a multiple of 4.
First, ensure that DecodeBufferSize is such a multiple of 4. Next, since StreamReader.Read does not guarantee that all the requested bytes will be read, you should continue reading into buffer until either it has been filled, or the end of the stream been reached.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Bytes read as UTF8 string and converted to Base64 - c#

Arbitrary binary data cannot be decoded into an UTF8 encoded string and then encoded back to the same binary data. The paragraph "Invalid byte sequences" in http://en.wikipedia.org/wiki/UTF-8 points that out. I am a bit confused as to why you want the data encoded/decoded as UTF8.

Related

How to remove BOM from an encoded base64 UTF string?

Converting a byte[] string back to byte[] array

Converting a string encoded in utf8 to unicode in C#

No map for object error when deserializing object

Decoding base64 file contents between PHP and C#

Categories

Resources