Converting Byte[] to string to remain the original byte format - c#

I have large amount of data which consists of tables,font,bold,size,etc. Those data will be stored as byte[] in Database.
when i retrieve those data i need to convert byte[] into string,because i need to some find & replace from this string,after i convert this string into byte[],am losing the original data structure which means, i can't able to see any tables,font,bold etc. properly. So how can i find and replace in byte[] by converting string and also to keep remain the data in original format.

The short answer is don't. Figure out the format of the data and see what you can do to do the manipulation. If the data is actually text, just stored as byte[], your approach would work, provided you encode the string correctly (ie. if your DB expects UTF-8, use UTF-8 encoding, if it's windows-1251, use that).
If you have a structure where a part of it is a string, what you're doing can't really work well. First, you probably want to modify just the relevant parts of the field. On MS SQL, you have handy functions for that. But even then, you should know what's actually stored there, not just assume that a string replace will magically work.
Now, a hack could be to use an explicit encoding that doesn't break the non-string data. That would be some single-byte encoding that doesn't do anything fancy. This is OK as long as you use the same encoding while reading the text data - however, if you use any variant of unicode, you're out of luck; due to features like string normalization, you can't really guarantee that what comes in comes out the same way, per-byte. It's generally a bad practice anyway.
Don't forget that it's quite possible the string you are looking for is actually somewhere outside of the text fields - even by pure chance, it can happen, and certain practices make that even more likely.
Again: figure out the data format inside that data field - then you can decide how to do what you want.

Try this
string result = System.Text.Encoding.UTF8.GetString(byteArray)

To make Byte[] to String
byte[] byteArray = new byte[10]; // put your byte array here
public void byteToString()
{
stringTemp = "";
stringTemp = BitConverter.ToString(byteArray).Replace("-", "");
}
And your data still in byteArray.. :)

If the byte Array contains binary data and is no string, try to convert it to base64:
Convert.ToBase64String(yourByteArray);

Related

Writing a ByteArray to file AS bytes in C#

My goal is to convert a string to a ByteArray, write this ByteArray AS a ByteArray to a string so it's unreadable but still readable again upon "ByteArray to String" conversion in C#.
This is how my code is right now:
string json = "{\"database\":{\"tables\":{\"Users\":[\"column\":{\"id\":\"1\", \"name\":\"Test\"}]}}}";
var bytes = Encoding.ASCII.GetBytes(json);
File.WriteAllBytes("database.dat", bytes);
This works in theory, however the final output file has the same content of the string, and not the converted ByteArray. This is what the file contains:
database.dat
{"database":{"tables":{"Users":["column":{"id":"1", "name":"Test"}]}}}
But I expected something like
l4#ˆC}nC(YXX>AI0ve‚22úL«*“ÑÃYgPæaiäi
’Ê¢±·Ä¿|^Û×RÉ!×¹ÝYPZŠO•QÚÉèT“g‘Ѳ¬¡\g²Ô
What am I doing wrong? Is this not a ByteArray? Is there another way to convert data to an unreadable file, and then be able to convert it back into a string in my program?
What am I doing wrong? Is this not a ByteArray? Is there another way
to convert data to an unreadable file, and then be able to convert it
back into a string in my program?
Depends how much unreadable you want it to be? In the most extreme case you might need to use encryption.
In your case, you are storing ASCII representation of the string into a file, so of course a text editor can read it back to you.
One way could be try converting the byte array which you obtained to base64 encoded string - and store that string in file. That way it will not be easily readable, however, someone else can still decode it if he/she tries. So the security guarantees provided aren't that much. But again, depends on your needs.

How to convert serialized byte array back to its text form

I have a text that is a property of an object.The object gets XmlSerialized and after that there is an element in the XML call Text that represents the text from the object.I am wondering how to turn it back to string.
THE TYPE OF SERIALIZATION: XmlSerializer serizlizer = new XmlSerializer(typeof(Act));
THE PROPERTY IN THE CLASS :
[System.Runtime.Serialization.OptionalFieldAttribute()]
private byte[] ActTextField;
In the xml file it looks something like that:
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAALQAAAAAAAAAAEAAALwAAAAEAAAD+////AAAAACwAAAD////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////spcEAJ2AJBAAA8BK/AAAAAAAAEAAAAAAABgAAYB4AAA4AYmpiavbg9uAAAAAAAAAAAAAAAAAAAAAAAAACBBYALiIAAJSKAQCUigEAzwYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAKQAAAAAANADAAAAAAAA0AMAANADAAAAAAAA0AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAABQAAAAAAAAAAAAAAOQDAAAAAAAArAgAAAAAAACsCAAAAAAAAKwIAAAAAAAArAgAABQAAADACAAAFAAAAOQDAAAAAAAA/Q4AALYAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAAWA4AAAIAAABaDgAAAAAAAFoOAAAAAAAAWg4AAAAAAABaDgAAAAAAAFoOAAAAAAAAWg4AACQAAACzDwAAaAIAABsSAACSAAAAfg4AADkAAAAAAAAAAAAAAAAAAAAAAAAA0AMAAAAAAADgCAAAAAAAAAAAAAAAAAAAAAAAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAH4OAAAAAAAAAAAAAAAAAADQAwAAAAAAANADAAAAAAAA4AgAAAAAAAAAAAAAAAAAAOAIAAAAAAAAtw4AABYAAAAkDgAAAAAAACQOAAAAAAAAJA4AAAAAAADgCAAAagMAANADAAAAAAAA4AgAAAAAAADQAwAAAAAAAOAIAAAAAAAAWA4AAAAAAAAAAAAAAAAAACQOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4AgAAAAAAABYDgAAAAAAAAAAAAAAAAAAJA4AAAAAAAAAAAAAAAAAACQOAAAAAAAA0AMAAAAAAADQAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJA4AAAAAAADgCAAAAAAAANQIAAAMAAAAUCGbpyopzgEAAAAAAAAAAKwIAAAAAAAASgwAANAAAAAkDgAAAAAAAAAAAAAAAAAAWA4AAAAAAADNDgAAMAAAAP0OAAAAAAAAJA4AAAAAAACtEgAAAAAAABoNAAD0AAAArRIAAAAAAAAkDgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAK0SAAAAAAAAAAAAAAAAAADQAwAAAAAAACQOAAA0AAAA4AgAAAAAAADgCAAAAAAAACQOAAAAAAAA4AgAAAAAAADgCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAAfg4AAAAAAAB+DgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADg4AABYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOAIAAAAAAAA4AgAAAAAAADgCAAAAAAAAP0OAAAAAAAA4AgAAAAAAADgCAAAAAAAAOAIAAAAAAAA4AgAAAAAAAAAAAAAAAAAAOQDAAAAAAAA5AMAAAAAAADkAwAAJAMAAAgHAACkAQAA5AMAAAAAAADkAwAAAAAAAOQDAAAAAAAACAcAAAAAAADkAwAAAAAAAOQDAAAAAAAA5AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAAAAAAADQAwAAAAAAANADAAAAAAAA0AMAAAAAAAD/////AAAAAAIADAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB4EIAAgAB8EIAAgACAEIAAgABUEIAAgABQEIAAgABUEIAAgABsEIAAgABUEIAAgAB0EIAAgABgEIAAgABUEIAANABUELgAgAEAEMAQ5BD4EPQQ1BD0EIABBBEoENAQgADIEIAA3BDAEOgRABDgEQgQ+BCAAQQRKBDQENQQxBD0EPgQgADcEMARBBDUENAQwBD0EOAQ1BCAAPQQwBCAANAQyBDAENAQ1BEEENQRCB
I can not even suppose what is its encoding and how to decode it.I tried to read it into byte array but it didn't actualy work after applying few decodings Encode.UTF8 , Encode.ASCII,
That looks like Base64 to me - just use
byte[] data = Convert.FromBase64String(base64Text);
It's odd that it's using base64 at all if this is really a text property though. I'd expect just the text.
To convert that binary data back to text you would need to know which encoding was used to convert it to the binary data to start with - and UTF-8 is the most likely - but all the repeated AAAAA... parts in there make this look pretty unlike text, to be honest.
EDIT: Now that we've seen the field declaration, we can see that it was a byte[] to start with, so that makes sense for it to be encoded in this way. Judging by comments, it sounds like it's actually a Word file - at which point extracting the text is a very separate problem.

Store binary data string into byte array using c#

I have a webservice that returns a binary data as a string. Using C# code how can I store it in byte array? Is this the right way?
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
byte[] bytes = encoding.GetBytes(inputString);
Actually, this didn't work. Let me explain it more:
the web service code converts a string (containing XSLFO data) into byte array using utf8 encoding. In my web service response I only see data something like "PGZvOnJvb3QgeG1sbnM6Zm89Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvWFNML0Zvcm1hdCIgeG1sbnM­6eGY9Imh0dHA6Ly93d3cuZWNyaW9uLmNvbS94Zi8xLjAiIHhtbG5zOm1zeHNsPSJ1c==". Actually I would like to have the original string value that was converted into byte[] in the service. Not sure if it possible?
No, that's a bad idea.
Unless the input data was originally text, trying to use Encoding is a bad idea. The web service should be using something like base64 to encode it - at which point you can use Convert.FromBase64String to get the original binary data back.
Basically, treating arbitrary binary data as if it were encoded text is a quick way to lose data. When you need to represent binary data in a string, you should use base64, hex or something similar.
This may mean you need to change the web service as well, of course - if it's creating the string by simply treating the binary data as UTF-8-encoded text, it's broken to start with.
If the string is encoded in UTF8 Encoding, then yes that is the correct way. If it is in Unicode it is very similar:
System.Text.Unicode encoding = new System.Text.Unicode();
byte[] bytes = encoding.GetBytes(inputString);
Base64Encoding is a little different:
byte[] bytes = Convert.FromBase64String(inputString);

Convert UCS-2 characters to UTF-8 Using C#

I'm pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I'm sending it out over the web. Currently, I have the following code to convert:
SqlString dbString = resultReader.GetSqlString(0);
byte[] dbBytes = dbString.GetUnicodeBytes();
byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode,
System.Text.Encoding.UTF8, dbBytes);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
string outputString = encoder.GetString(utf8Bytes);
However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to.
What am I missing?
EDIT:
In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example:
OutputControl.Text = "カルフォルニア工科大学とチューリッヒ工科大学は共同で、太陽光を保管可能な燃料に直接変えることのできる装置の開発に成功したとのこと";
works. Here, OutputControl is an ASP.Net Literal. However,
OutputControl.Text = outputString; //Output from above snippet
results in mangled output as described above. My hypothesis was that the database's output was somehow getting mangled by ASP.Net. If that's not the case, then what are some other possibilities?
EDIT 2:
Okay, I'm stupid. It turns out that there's nothing wrong with the database at all. When I tried inserting my own literal double byte characters (材料,原料;木料), I could read and output them just fine even without any conversion process at all. It seems to me that whatever is inserting the data into the DB is mangling the characters somehow, so I'm going to look at that. With my verified, "clean" data, the following code works:
OutputControl.Text = dbString.ToString();
as the responses below indicate it should.
Your code does essentially the same as:
SqlString dbString = resultReader.GetSqlString(0);
string outputString = dbString.ToString();
string itself is a UNICODE string (specifically, UTF-16, which is 'almost' the same as UCS-2, except for codepoints not fitting into the lowest 16 bits). In other words, the conversions you are performing are redundant.
Your web app most likely mangles the encoding somewhere else as well, or sets a wrong encoding for the HTML output. However, that can't be diagnosed from the information you provided so far.
String in .net is 'encoding agnostic'.
You can convert bytes to string using a particular encoding to tell .net how to interprets your bytes.
You can convert string to bytes using a particular encoding to tell .net how you want your bytes served.
But trying to convert a string to another string using encodings makes no sens at all.

Converting string from memorystream to binary[] contains leading crap

--Edit with more bgnd information--
A (black box) COM object returns me a string.
A 2nd COM object expects this same string as byte[] as input and returns a byte[] with the processed data.
This will be fed to a browser as downloadable, non-human-readable file that will be loaded in a client side stand-alone application.
so I get the string inputString from the 1st COM and convert it into a byte[] as follows
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, inputString);
obj = ms.ToArray();
I feed it to the 2nd COM and read it back out.
The result gets written to the browser.
Response.ContentType = "application/octet-stream";
Response.AddHeader("content-disposition", "attachment; filename="test.dat");
Response.BinaryWrite(obj);
The error occurs in the 2nd COm because the formatting is incorrect.
I went to check the original string and that was perfectly fine. I then pumped the result from the 1st com directly to the browser and watched what came out. It appeared that somewhere along the road extra unreadable characters are added. What are these characters, what are they used for and how can I prevent them from making my 2nd COM grind to a halt?
The unreadable characters are of this kind:
NUL/SOH/NUL/NUL/NUL/FF/FF/FF/FF/SOH/NUL/NUL/NUL etc
Any ideas?
--Answer--
Use
System.Text.Encoding.UTF8.GetBytes(theString)
rather then
BinaryFormatter.Serialize()
BinaryFormatter is almost certainly not what you want to use.
If you just need to convert a string to bytes, use Encoding.GetBytes for a suitable encoding, of course. UTF-8 is usually correct, but check whether the document specifies an encoding.
Okay, with your updated information: your 2nd COM object expects binary data, but you want to create that binary data from a string. Does it treat it as plain binary data?
My guess is that something is going to reverse this process on the client side. If it's eventually going to want to reconstruct the data as a string, you need to pick the right encoding to use, and use it on both sides. UTF-8 is a good bet in most cases, but if the client side is just going to write out the data to a file and use it as an XML file, you need to choose the appropriate encoding based on the XML.
You said before that the first few characters of the string were just "<foo>" (or something similar) - does that mean there's no XML declaration? If not, choose UTF-8. Otherwise, you should look at the XML declaration and use that to determine your encoding (again defaulting to UTF-8 if the declaration doesn't specify the encoding).
Once you've got the right encoding, use Encoding.GetBytes as mentioned in earlier answers.
I think you are missing the point of BinarySerialization.
For starters, what Type is formulaXml?
Binary serialization will compact that into a machine represented value, NOT XML! The content will look like:
ÿÿÿÿ AIronScheme, Version=1.0.0.0, Culture=neutral, Public
Perhaps you should be looking at the XML serializer instead.
Update:
You want to write out some XML as a 'content-disposition' stream.
To do this, do something like:
byte[] buffer = Encoding.Default.GetBytes(formulaXml);
Response.BinaryWrite(buffer);
That should work like you hoped (I think).
The BinaryFormatter's job is to convert the objects into some opaque serialisation format that can only be understood by another BinaryFormatter at the other end.
(Just about to mention Encoding.GetBytes as well, but Jon beat me to it.)
You'll probably want to use System.Text.Encoding.UTF8.GetBytes().
Is the crap in the beginning two bytes long?
This could be the byte order mark of a Unicode encoded string.
http://en.wikipedia.org/wiki/Byte-order_mark

Categories