I have a byte array, read from an image file, that I am trying to send from C# across a socket to a Meteor server running collectionFS (v0.3.7).
I am trying to convert it to a string to match the result I would get from calling FileReader.readAsBinaryString() in JavaScript, for example:
?PNG\r\n\u001a\n\u0000\u0000\u0000\rIHDR\u0000\u0000\u0003?\u0000\u0000\u0002?
In my C# code, I have tried using System.Text.Encoding.UTF8.GetString(), which gives me something like this:
�PNG\r\n\n\0\0\0\rIHDR\0\0�\0\0
This fails on the transfer, presumably because the '\0' is treated like the end of the string.
Can anyone better explain what is happening here? Is there a nice way in C# to format the bytes using the unicode escape sequences like readAsBinaryString() does?
EDIT: The eventual destination for this data is a BSON binary entry in MongoDB (in Meteor), to be later extracted (as a Blob) and viewed through the normal Meteor web browser client.
There is no built in method that does exactly that.
To transform byte array to encoded you need to decide what is encoded and what is not. Looks like 0-9a-zA-Z range should not be encoded and the rest encoded as \uXXXX:
I'd do something like following:
var result = String.Join("", byteArray
.Select(b => b >'0' && b <'9' ?
(char)b.ToString() : String.Format(#"\u{0:x4}", b)));
Related
In my python code i have a value in byte code, whenever print that byte code it will give something like this,
b'\xe0\xb6\x9c\xe0\xb7\x92\xe0\xb6\xb1\xe0\xb7\x8a\xe0\xb6\xaf\xe0\xb6\xbb'
now, that value in string format in c# that is,
string byteString = "b'\xe0\xb6\x9c\xe0\xb7\x92\xe0\xb6\xb1\xe0\xb7\x8a\xe0\xb6\xaf\xe0\xb6\xbb'";
so question is how can i convert that byteString to byte array in c#
but, my actual problem is i have a string value in python which is not in English, when i run the python code it will print the string(in non English, work fine).
But, whenever run that python code in c# from process class it work fine for English and i can get the value. but it not working for non English characters, it was a null value. therefore, in python if i print that non English value in byte code i can get the value in c#. problem is how can i convert that in byte code into byte array in c#.
First, you want to modify your string slightly for usage in C#.
var str = "\xe0\xb6\x9c\xe0\xb7\x92\xe0\xb6\xb1\xe0\xb7\x8a\xe0\xb6\xaf\xe0\xb6\xbb";
You can then get your bytes fairly easily with LINQ.
var bytes = str.Select(x => Convert.ToByte(x)).ToArray();
An odd case that can occur with trying to use byte strings between Python and C# is that python will sometimes put out straight ASCII characters for certain byte values, leaving you with a mixed string like b'\xe0ello'. C# recognizes \x##, but it also attempts to parse \x####, which will tend to break when dealing with the output of a python bytestring that mixes hex codes and ascii.
I'm using Base64 encoding to store values from my data structure into a string.
Basically what I do is convert a byte array into base64 string
string StoredData = Convert.ToBase64String(ByteArray);
I then divide StoredData into strings of a maximum length of 256 Characters and store them as an ASCII string (in AutoCAD XData as an DxfCode.ExtendedDataAsciiString) .
When I want to retrieve my data I do the following:
First I combine each 256 long string using StoredData = sting1 + string2 + ...
Then I convert StoredData back into ByteArray using
var ByteArray = Convert.FromBase64String(StoredData);
Now this has worked great for me and my clients until a month ago, where one of my clients has had some crash and errors popping up.
I asked him to send me his stored data, and I got surprised to see that his data contained invalid Base64 Characters (see sample below)
tM7x24QLLLALr5ivAx3XFAM7uciYXrCjKXSFd3XOL/KGIc3C+JMO8QjHT/4c+puYrNLq5r9Is0vpDKyuxw9I6R3f1LuOYSdHS6XgZJEyMvGwSHNRSYJ/a0IoumQftB3XspQRwp4QSd7qcUVsrXw0+2RS/sd2vAvUFxEQgwsHaabb01YjchGeyxr1f78A4qy2BL/oHAsRak9UYN0mDzhZgbhpahlgdK3eWd8b2BTM01lWh74pYUrJR+JfQ0tw0Eu㿔
Z/1JxBMUv2cB6NrFehSuNF9l4dhAaZQ+TcIClZmk/ZC8TJ0rKka/J+HqhLDAwWExB3nXoIi00uJnE7J4R6rU+Q==
as you can see the first 256 long string had an invalid Base64 character (㿔)
Why is that happening? can this be related to the users computer? I tried to replicate this error without any success and because I don't have access to their computers, I'm starting to think it might be something on their side.
The application uses .Net framework version 4.5.
Edit: it turned out client has sent me a recovered document which didn't recover the text strings properly which explains the corrupted string.
It turns out the app has crashed and client has recovered the drawing document with corrupted string.
I have a text that is a property of an object.The object gets XmlSerialized and after that there is an element in the XML call Text that represents the text from the object.I am wondering how to turn it back to string.
THE TYPE OF SERIALIZATION: XmlSerializer serizlizer = new XmlSerializer(typeof(Act));
THE PROPERTY IN THE CLASS :
[System.Runtime.Serialization.OptionalFieldAttribute()]
private byte[] ActTextField;
In the xml file it looks something like that:
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAALQAAAAAAAAAAEAAALwAAAAEAAAD+////AAAAACwAAAD////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////spcEAJ2AJBAAA8BK/AAAAAAAAEAAAAAAABgAAYB4AAA4AYmpiavbg9uAAAAAAAAAAAAAAAAAAAAAAAAACBBYALiIAAJSKAQCUigEAzwYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAKQAAAAAANADAAAAAAAA0AMAANADAAAAAAAA0AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAABQAAAAAAAAAAAAAAOQDAAAAAAAArAgAAAAAAACsCAAAAAAAAKwIAAAAAAAArAgAABQAAADACAAAFAAAAOQDAAAAAAAA/Q4AALYAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAAWA4AAAIAAABaDgAAAAAAAFoOAAAAAAAAWg4AAAAAAABaDgAAAAAAAFoOAAAAAAAAWg4AACQAAACzDwAAaAIAABsSAACSAAAAfg4AADkAAAAAAAAAAAAAAAAAAAAAAAAA0AMAAAAAAADgCAAAAAAAAAAAAAAAAAAAAAAAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAH4OAAAAAAAAAAAAAAAAAADQAwAAAAAAANADAAAAAAAA4AgAAAAAAAAAAAAAAAAAAOAIAAAAAAAAtw4AABYAAAAkDgAAAAAAACQOAAAAAAAAJA4AAAAAAADgCAAAagMAANADAAAAAAAA4AgAAAAAAADQAwAAAAAAAOAIAAAAAAAAWA4AAAAAAAAAAAAAAAAAACQOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4AgAAAAAAABYDgAAAAAAAAAAAAAAAAAAJA4AAAAAAAAAAAAAAAAAACQOAAAAAAAA0AMAAAAAAADQAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJA4AAAAAAADgCAAAAAAAANQIAAAMAAAAUCGbpyopzgEAAAAAAAAAAKwIAAAAAAAASgwAANAAAAAkDgAAAAAAAAAAAAAAAAAAWA4AAAAAAADNDgAAMAAAAP0OAAAAAAAAJA4AAAAAAACtEgAAAAAAABoNAAD0AAAArRIAAAAAAAAkDgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAK0SAAAAAAAAAAAAAAAAAADQAwAAAAAAACQOAAA0AAAA4AgAAAAAAADgCAAAAAAAACQOAAAAAAAA4AgAAAAAAADgCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAAfg4AAAAAAAB+DgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADg4AABYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAP0OAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAAAAAAAAAAAAAOQDAAAAAAAA5AMAAAAAAADkAwAAJAMAAAgHAACkAQAA5AMAAAAAAADkAwAAAAAAAOQDAAAAAAAACAcAAAAAAADkAwAAAAAAAOQDAAAAAAAA5AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAAAAAAAD/////AAAAAAIADAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB4EIAAgAB8EIAAgACAEIAAgABUEIAAgABQEIAAgABUEIAAgABsEIAAgABUEIAAgAB0EIAAgABgEIAAgABUEIAANABUELgAgAEAEMAQ5BD4EPQQ1BD0EIABBBEoENAQgADIEIAA3BDAEOgRABDgEQgQ+BCAAQQRKBDQENQQxBD0EPgQgADcEMARBBDUENAQwBD0EOAQ1BCAAPQQwBCAANAQyBDAENAQ1BEEENQRCB
I can not even suppose what is its encoding and how to decode it.I tried to read it into byte array but it didn't actualy work after applying few decodings Encode.UTF8 , Encode.ASCII,
That looks like Base64 to me - just use
byte[] data = Convert.FromBase64String(base64Text);
It's odd that it's using base64 at all if this is really a text property though. I'd expect just the text.
To convert that binary data back to text you would need to know which encoding was used to convert it to the binary data to start with - and UTF-8 is the most likely - but all the repeated AAAAA... parts in there make this look pretty unlike text, to be honest.
EDIT: Now that we've seen the field declaration, we can see that it was a byte[] to start with, so that makes sense for it to be encoded in this way. Judging by comments, it sounds like it's actually a Word file - at which point extracting the text is a very separate problem.
I have a webservice that returns a binary data as a string. Using C# code how can I store it in byte array? Is this the right way?
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
byte[] bytes = encoding.GetBytes(inputString);
Actually, this didn't work. Let me explain it more:
the web service code converts a string (containing XSLFO data) into byte array using utf8 encoding. In my web service response I only see data something like "PGZvOnJvb3QgeG1sbnM6Zm89Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvWFNML0Zvcm1hdCIgeG1sbnM6eGY9Imh0dHA6Ly93d3cuZWNyaW9uLmNvbS94Zi8xLjAiIHhtbG5zOm1zeHNsPSJ1c==". Actually I would like to have the original string value that was converted into byte[] in the service. Not sure if it possible?
No, that's a bad idea.
Unless the input data was originally text, trying to use Encoding is a bad idea. The web service should be using something like base64 to encode it - at which point you can use Convert.FromBase64String to get the original binary data back.
Basically, treating arbitrary binary data as if it were encoded text is a quick way to lose data. When you need to represent binary data in a string, you should use base64, hex or something similar.
This may mean you need to change the web service as well, of course - if it's creating the string by simply treating the binary data as UTF-8-encoded text, it's broken to start with.
If the string is encoded in UTF8 Encoding, then yes that is the correct way. If it is in Unicode it is very similar:
System.Text.Unicode encoding = new System.Text.Unicode();
byte[] bytes = encoding.GetBytes(inputString);
Base64Encoding is a little different:
byte[] bytes = Convert.FromBase64String(inputString);
I'm pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I'm sending it out over the web. Currently, I have the following code to convert:
SqlString dbString = resultReader.GetSqlString(0);
byte[] dbBytes = dbString.GetUnicodeBytes();
byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode,
System.Text.Encoding.UTF8, dbBytes);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
string outputString = encoder.GetString(utf8Bytes);
However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to.
What am I missing?
EDIT:
In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example:
OutputControl.Text = "カルフォルニア工科大学とチューリッヒ工科大学は共同で、太陽光を保管可能な燃料に直接変えることのできる装置の開発に成功したとのこと";
works. Here, OutputControl is an ASP.Net Literal. However,
OutputControl.Text = outputString; //Output from above snippet
results in mangled output as described above. My hypothesis was that the database's output was somehow getting mangled by ASP.Net. If that's not the case, then what are some other possibilities?
EDIT 2:
Okay, I'm stupid. It turns out that there's nothing wrong with the database at all. When I tried inserting my own literal double byte characters (材料,原料;木料), I could read and output them just fine even without any conversion process at all. It seems to me that whatever is inserting the data into the DB is mangling the characters somehow, so I'm going to look at that. With my verified, "clean" data, the following code works:
OutputControl.Text = dbString.ToString();
as the responses below indicate it should.
Your code does essentially the same as:
SqlString dbString = resultReader.GetSqlString(0);
string outputString = dbString.ToString();
string itself is a UNICODE string (specifically, UTF-16, which is 'almost' the same as UCS-2, except for codepoints not fitting into the lowest 16 bits). In other words, the conversions you are performing are redundant.
Your web app most likely mangles the encoding somewhere else as well, or sets a wrong encoding for the HTML output. However, that can't be diagnosed from the information you provided so far.
String in .net is 'encoding agnostic'.
You can convert bytes to string using a particular encoding to tell .net how to interprets your bytes.
You can convert string to bytes using a particular encoding to tell .net how you want your bytes served.
But trying to convert a string to another string using encodings makes no sens at all.