i read some data from a device. Then i send this data to a web server via xml. The data should be represented in xml so this makes me convert characters between 0-31 because these chars can not be displayed on xml.
The question is how can i convert the chars between 0-31 decimal in a string like [00]abcde[01]fgh[02]...
Are there any built-in function in .net framework or any accepted pattern?
Thanks
You should use standard XML encoding:
Your XML API will do that for you, so you don't need to worry about anything.
You can simply encode the number as an XML entity you write &# followed by the number and a semicolon
so 1 becomes and 13 becomes
and so on and so forth
However as noted by dan04 you can't represent 0 as a numeric character reference, so in the case where your data might include 0 you will have to use a different encoding. You could encode the entire binary data as base64
Most XML toolboxes will do the encoding to NCRs for you though so you really shouldn't have to worry about that
Related
I am in the process of porting data from a legacy system, but I am unsure what encoding it uses internally to store the data. I have noticed that the data corresponds to ASCII code values (i.e. character ë, or small letter e with diaeresis, is stored as byte value 137 as per this chart).
I need to encode the data using ISO-8859-1 for the destination system, but obviously using the data as-is yields the incorrect results (in ISO-8859-1 the per mille sign is represented by decimal 137, as per this chart).
I need some advice on what encoding I can use when reading the data - i.e. an encoding that corresponds to the decimal ASCII code values.
I found my answer in this SO post. It turns out that code page 437 corresponds to the extended ASCII character codes. I was thus able to re-encode the data as follows:
var output = Encoding.Convert(Encoding.GetEncoding(437), Encoding.GetEncoding("ISO-8859-1"), input);
I have tried to export html data to database with UTF-8 encoding.But I am seeing some "Broken characters" in the database.For example:
while exporting the html data
· A contribution .....
· Choice between two .............
into the database with UTF-8.The (.)midpoint in the data is converted in to (Â.).I need to convert the data explicitly to ISO-8859-1 by removing the broken character in c#. Is there any way to do that ?
Thanks in Advance
There is a way to specify methods to be invoked in order to handle bad characters when encoding and decoding. You can substitute other characters, or throw an exception, or provide logging etc.
See this overload of GetEncoding() for details.
I am trying to pass a block of text to a system I do not own, which will pass the data to a system I do own.
Unfortunately, when the first system talks to the second system, it uses a TSV format. Thus, I wonder if there's a convenient way to take my block of text and encode it in an ASCII format without any kind of whitespace (mostly newlines and tabs, of course), and then later decode it.
When I'm doing the encoding, I'm working in C#. When I'm doing the decoding, I'm working in Javascript.
I realize that I can write my own code to essentially "manually" perform the encoding and decoding by creating my own scheme, but I wonder if there already exists one for this purpose.
One option which would blow up the size of your data but be really simple to implement: UTF-8 encode all the text, base64-encode that:
byte[] utf8 = Encoding.UTF8.GetBytes(text);
string base64 = Convert.ToBase64(utf);
That won't contain any whitespace, and can be converted back. It'll be significantly larger than the original string, and unreadable... but it'll work.
You could try using HttpUtility.UrlEncode(string) or Uri.EscapeDataString(string), which would percent-encode any whitespace in the passed in text (as well as other special characters, which means the encoded text may be much larger than the original).
On the javascript side you could then use decodeURIComponent(string) to decode it back to the original text.
Additional information: Unable to
translate Unicode character \uDFFF at
index 195 to specified code page.
I made an algorithm, who's result are binary values (different lengths). I transformed it into uint, and then into chars and saved into stringbuilder, as you can see below:
uint n = Convert.ToUInt16(tmp_chars, 2);
_koded_text.Append(Convert.ToChar(n));
My problem is, that when i try to save those values into .txt i get the previously mentioned error.
StreamWriter file = new StreamWriter(filename);
file.WriteLine(_koded_text);
file.Close();
What i am saving is this: "忿췾᷿]볯褟ﶞ痢ﳻ��伞ﳴ㿯ﹽ翼蛿㐻ﰻ筹��﷿₩マ랿鳿⏟麞펿"... which are some weird signs.
What i need is to convert those binary values into some kind of string of chars and save it to txt. I saw somewhere that converting to UTF8 should help, but i don't know how to. Would changing files encoding help too?
You cannot transform binary data to a string directly. The Unicode characters in a string are encoded using utf16 in .NET. That encoding uses two bytes per character, providing 65536 distinct values. Unicode however has over one million codepoints. To make that work, the Unicode codepoints above \uffff (above the BMP, Basic Multilingual Plane) are encoded with a surrogate pair. The first one has a value between 0xd800 and 0xdbff, the second between 0xdc00 and 0xdfff. That provides 2 ^ (10 + 10) = 1 million additional codes.
You can perhaps see where this leads, in your case the code detects a high surrogate value (0xdfff) that isn't paired with a low surrogate. That's illegal. Lots more possible mishaps, several codepoints are unassigned, several are diacritics that get mangled when the string is normalized.
You just can't make this work. Base64 encoding is the standard way to carry binary data across a text stream. It uses 6 bits per character, 3 bytes require 4 characters. The character set is ASCII so the odds of the receiving program decoding the character back to binary incorrectly are minimal. Only a decades old IBM mainframe that uses EBCDIC could get you into trouble. Or just plain avoid encoding to text and keep it binary.
Since you're trying to encode binary data to a text stream this SO question already contains an answer to the question: "How do I encode something as base64?" From there plain ASCII/ANSI text is fine for the output encoding.
I know its a recurrent question here but no one of answers havent work for me.
From a system I'm receiving a Unicode text. Just an email + name from customers.
When I record these strings to my SQL DB the appears some chars appears with \u.
For example the emails are getting in the DB: name\u0040domain.com
How I transform the Unicode string in my c# program to ascii, so the DB gets name#domain.com.
Also that replace special chars to equivalent or to no one... For example "Hernán π" to "Hernan "
Thanks!
IMHO converting Unicode back to ASCII for some dubious storage or technical benefit isn't a good idea in the 21st century, especially since email is being changed to support Unicode in headers and bodies.
http://en.wikipedia.org/wiki/Unicode_and_e-mail
If the reason why you want to convert Hernán to Hernan is for searching, you should look at using an Accent Insensitive (AI) collation on your database, or coerce it to do so - see this SO post.
One thing you might need to double check however is that your strings aren't getting preencoded before storage in your database (assuming that your DB column is set to accept unicode - i.e. NVARCHAR etc), the character '#' should be stored as '#' (0040 in UTF 16) and not as '\u0040'.
EDIT:
The "\uNNNN" encoding in a string might originate from Java or Python.
You might be able to trace the email string data up your architecture to find the source of this encoding and change it to something more easy to decode in C# such as UTF-8.
How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?
You can use Encoding.Convert for such operations. Read about this on MSDN