How to get unicode value used XmlReader class in C# - c#

I have used XmlReader Class and have to Parsed font file Svg format. I could not parsed glyph
tags attribute unicode string value
<svg><font><glyph unicode="" /></font></svg>
I had tried
if (xmlReader.GetAttribute("unicode") != null)
{
string unicode = xmlReader.GetAttribute("unicode");
}
Got output
unicode=""
I need exactly unicode string value.
Can anyone Answer please !

There's nothing wrong with the response - that's the Unicode character represented by the UTF-8 E600 hex value. This value is in the private use area which means there is no standard glyph that can be shown, so a default glyph is used.
A char is a 16-bit number representing a UTF16 codepoint, so you already have the codepoint. If you want to format it as a hex string you can use the x4 format, eg:
char theChar=unicode.Chars[0];
string hexString=String.Format("{0:x4}",theChar);
This will return E600.
If you want to output the same string as the original you can use "&#x{0x:4}"

Related

Printing Hexadecimals as Hexadecimals in C#?

I need to send an array of bytes to a hardware (SDZ16 matrix) using a Serial Port. The trick is in the fact that that hardware expects strings of hexadecimal and ASCII characters.
When assigning values to the array of bytes, even if I set the bytes to an explicit hexadecimal value
(bytes[0] = 0xF2, for instance), it will print the equivalent decimal value (242 instead of F2).
I am suspicious that the problem is in the Console.WriteLine(); which when printing each byte sets them by default as integers(?) How does C# keep track that there is an Hexadecimal value inside an int?
If I assign bytes[0] = 0xF2; will the hardware understand it in hexadecimal even if Console.WriteLine(); shows differently will testing?
If you want to get a string representation in hex format you can do so by using a corresponding numeric format string:
byte value = 0xF2;
string hexString = string.Format("{0:X2}", value);
Note that Console.WriteLine has an overload that takes a format string and a parameter list:
Console.WriteLine("{0:X2}", value);
Update: I just had a glimpse at the documentation here, and it seems that you need to send commands by providing the corresponding ASCII representation in the form of a string. You can get the ASCII representation using:
byte value = 0x01;
string textValue = value.ToString().PadLeft(2, '0');
byte[] ascii = Encoding.ASCII.GetBytes(textValue)
My tip would be to carefully check the documentation of your equipment to find out which exact format is expected.
it will print the equivalent decimal value (242 instead of F2).
Yes because 0xF2 is still 242. It is just an hexadecimal notation. Most comman prefix is 0x in this notation. Even if you use debugger, you can see it's decimal notation.
I am suspicious that the problem is in the Console.WriteLine(); which
when printing each byte sets them by default as integers(?)
No, Console.WriteLine() method nothing do here.
How does C# keep track that there is an Hexadecimal value inside an
int?
There is no such a thing as Hexadecimal value inside an int. It is just a notation.
If you wanna hexadecimal notation of a number, you can use The hexadecimal "X" format specifier like;
byte b = 0xF2;
Console.WriteLine(b.ToString("X")); //F2
If you wanna get with prefix; you can do;
byte b = 0xF2;
Console.WriteLine("0x{0}", b.ToString("X")); //0xF2

Handle Non-UTF-8 Characters in Byte Array

I have an array of bytes which contains some characters that are not UTF-8. These characters cannot be deserialized using UTF-8 encoding. So, my question is, how can I handle these characters and make the string readable in whatever language it is.
For example, if I have an array:
byte[] b = myArrayWithNonUTF8Characters;
And I try to deserialize the array with:
DataContractJsonSerializer jsonSerializer = new DataContractJsonSerializer(typeof(MyObject));
MyObject objResponse = (MyObject)jsonSerializer.ReadObject(new MemoryStream(b));
Then I get an error that the array contains invalid UTF8 bytes.
Any way to make this work?
PS: Please, do not give me this answer: string s = System.Text.Encoding.UTF8.GetString(b, 0, b.Length); It will only return symbols replacing the non-UTF-8 characters.
The beauty of UTF is that it encodes characters in most languages; so you can have Greek and Japanese in the same character stream.
Without UTF, your entire stream (or in your case an array) must be in a single language defined by a Code Page. Each character is represented by an ASCII byte but the actual character is determined by the Code Page (see http://en.wikipedia.org/wiki/Code_page for more details).
For example if your text was written in Greek you might use Code Page 111:
System.Text.Encoding.GetEncoding(111)
In short, you need to know what language the ASCII text was written in.

Decoding in asp.net 4.0 - Special Characters

decoding of a special characters in asp.net as per the W3C standards. ASCII - URL encoding chart. Some of the special characters are not being converted instead converting to "?", check the below issue, actual result for %92 ASCII is "`", I'm trying to achieve this to decode to a urlencoding equal character.
strurl="Workers%92+Accommodation";
string strdecode=Server.UrlDecode(strurl);
Ex: ASCII code %92 (as per W3C standarards url encoding is for ' - which is not there in key board refer http://www.w3schools.com/TAGS/ref_urlencode.asp).
try to encode/decode with this:
public static string EncodeString(string decodedString)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(decodedString));
}
public static string DecodeString(string encodedString)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(encodedString));
}
edit add:
When you are going to transmit a value via a URL the method to use is the: HttpServerUtility.UrlTokenEncode() method
reference:
http://msdn.microsoft.com/en-us/library/system.web.httpserverutility.urltokenencode.aspx
You have to decode with appropriate encoding codepage. Quoting from http://www.ascii-code.com:
There are several different variations of the 8-bit ASCII table. The table below is according to ISO 8859-1, also called ISO Latin-1....
According to http://en.wikipedia.org/wiki/Windows_code_page, codepage for ISO Latin-1 is 1252. To decode your string, simply do the following:
strurl = Encoding.GetEncoding(1252).
GetString(HttpUtility.UrlDecodeToBytes("%92"));

ASCII Extended in C# string

How do I make a string in C# to accept non printable ASCII extended characters like • , cause when I try to put • in a string it just give a blank space or null.
Extended ASCII is just ASCII with the 8 high bits set to different values.
The problem lies in the fact that no commission has ratified a standard for extended ASCII. There are a lot of variants out there and there's no way to tell what you are using.
Now C# uses UTF-16 encoding which will be different from whichever extended ASCII you are using.
You will have to find the matching Unicode character and display it as follows
string a ="\u2649" ; //where 2649 is a the Unicode number
Console.write(a) ;
Alternatively you could find out which encoding your files use and use it like so
eg. encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
and for UTF-16
Encoding enc = new UnicodeEncoding(false, true, true);
and convert it using
Encoding.Convert (Encoding, Encoding, Byte[], Int32, Int32)
Details are here
Try this..
Convert those charcaters as string as folows.
string equivalentLetter = Encoding.Default.GetString(new byte[] { (byte)letter });
Now, the equivalent letter contains the correct string.
I tried this for EURO symbol, it worked.
.NET strings are UTF-16 encoded, not extended-ascii (whatever that is). By simply adding a number to a character will give you another defined character within the UTF-16 plain set. If you want to see the underlying character as it would be in your extended ASCII encoding you need to convert the newly calculated letter from whatever encoding you are talking about to UTF-16. See: http://msdn.microsoft.com/en-us/library/66sschk1.aspx

What happens to a null byte when converting bytes to ISO 8859-1 encoding?

I'm not entirely sure if the question even makes sense. I'm converting a byte array taken from an ID3 tag and converting it to a string. Most text frames in an ID3 tag use ISO 8859-1 encoding but it depends on the frame. In any case, if you look up what 0x00 is in the ISO 8859-1 codes it is invalid.
To further complicate, either due programmer error or just poor formatting, some of the strings end in 0x00 and some do not.
When converting a series of bytes into a string using ISO 8859-1 encoding do you have manually check the end of the string to see if it is a null? Or will the encoding object through whatever method it uses to convert in the first place deal with the null properly? Furthermore, is there some sort of function that could normalize or "fix" the null terminated string?
When you try to display these strings they do not display properly.
I am using C# for this particular project.
Some extra info here about ID3 Tags: ID3 Specs
Or am I completely misunderstanding the whole thing? Is a null terminator simply a way a particular language handles strings and it has nothing to do with encoding?
Edit: I used System.Text.Encoding.GetEncoding("iso-8859-1") followed by a GetString call
If you use Encoding.GetEncoding(28591), it just converts a byte 0 to the Unicode U+0000. Encodings generally assume that they have to convert all the bytes - they don't look for terminators.
This treatment of 0 as Unicode 0 is inline with the Wikipedia description:
In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value.
The C0 and C1 control characters page includes:
0: Originally used to allow gaps to be left on paper tape for edits. Later used for padding after a code that might take a terminal some time to process (e.g. a carriage return or line feed on a printing terminal). Now often used as a string terminator, especially in the C programming language.
Sample code:
using System;
using System.Text;
class Program
{
static void Main(string[] args)
{
byte[] data = { 0, 0 };
Encoding latin1 = Encoding.GetEncoding(28591);
string text = latin1.GetString(data);
Console.WriteLine(text.Length); // 2
Console.WriteLine((int) text[0]); // 0
Console.WriteLine((int) text[1]); // 0
}
}
Happily, ASCII, ISO-8859-1 and Unicode all agree on codepoints in the range 0..127. Thus your character '\0' will be encoded identically in ASCII, ISO-8859-1 and UTF-8.
If your program assigns special semantics to the zero byte, you have to take care of that appropriately.

Categories