How do I convert C# characters to their hexadecimal code representation - c#

What I need to do is convert a C# character to an escaped unicode string:
So, 'A' - > "\x0041".
Is there a better way to do this than:
char ch = 'A';
string strOut = String.Format("\\x{0}", Convert.ToUInt16(ch).ToString("x4"));

Cast and use composite formatting:
char ch = 'A';
string strOut = String.Format(#"\x{0:x4}", (ushort)ch);

Related

Encode - C# convert ISO-8859-1 entities number to characters

I found a question about how to convert ISO-8859-1 characters to entity number
C# convert ISO-8859-1 characters to entity number
code:
string input = "Steel Décor";
StringBuilder output = new StringBuilder();
foreach (char ch in input)
{
if (ch > 0x7F)
output.AppendFormat("&#{0};", (int) ch);
else
output.Append(ch);
}
// output.ToString() == "Steel Décor"
but i didn't figure out how to do the opposite converting from entity number to character like from
//"Steel Décor" to "Steel Décor"
ps: all accent character in my string are entity code

In C# When is the right time to use Apostrophe Quotation marks

I would like to know why some things have to be within a pair of Apostrophes and others within Quotation marks?
void trythis(){
char myChar = 'Stuff';
String myString = "Blah";
int myInteger = '22';
Serial.print(myChar );
Serial.print(myString );
Serial.print(myInteger );
}
Number should have no quotes int x= 56
characters have single quotes char ch = 'a';
strings have double quotes string name = "Bob";
Character literals use a single quote. So when you're dealing with char, that's 'x'.
String literals use double quotes. So when you're dealing with string, that's "x".
A char is a single UTF-16 code unit - in most cases "a single character". A string is a sequence of UTF-16 code units, i.e. "a piece of text" of (nearly) arbitrary length.
Your final example, after making it compile, would look something like:
int myInteger = 'x';
That's using a character literal, but then implicitly converting it to int - equivalent to:
char tmp = 'x';
int myInteger = tmp;
The code you wrote doesn't compile at all.
Single quotes are used for character literals (single characters, which are stored as UTF-16 in .NET). Integers are not quoted.
This would be valid:
char myChar = 's';
string myString = "Blah";
int myInteger = 22;

Unicode to ASCII with character translations for umlats

I have a client that sends unicode input files and demands only ASCII encoded files in return - why is unimportant.
Does anyone know of a routine to translate unicode string to a closest approximation of an ASCII string? I'm looking to replace common unicode characters like 'ä' to a best ASCII representation.
For example: 'ä' -> 'a'
Data resides in SQL Server however I can also work in C# as a downstream mechanism or as a CLR procedure.
Just loop through the string. For each character do a switch:
switch(inputCharacter)
{
case 'ä':
outputString = "ae";
break;
case 'ö':
outputString = "oe";
break;
...
(These translations are common in german language with ASCII only)
Then combine all outputStrings with a StringBuilder.
I think you really mean extended ASCII to ASCII
Just a simple dictionary
Dictionary<char, char> trans = new Dictionary<char, char>() {...}
StringBuilder sb = new StringBuilder();
foreach (char c in string.ToCharArray)
{
if((Int)c <= 127)
sb.Append(c);
else
sbAppend(trans[c]);
}
string ascii = sb.ToString();

Unable to remove invisible chars using Regex

I want to remove any invisible chars from a string, only keep spaces & any chars from 0x20-0x7F,
I use this: Regex.Replace(QueryString, #"[^\s\x20-\x7F]", "");
However it does not work
QueryString has a char 0xA0, after that, the char still exists in QueryString.
I am not sure why this failed to work?
0xA0 is the non-breaking space character - and as such it's matched with \s. Rather than using \s, expand this out into the list of whitespace characters you want to include.
I think you would rather use StringBuilder to process such strings.
StringBuilder sb = new StringBuilder(str.Length);
foreach(char ch in str)
{
if (0x20 <= ch && ch <= 0x7F)
{
sb.Append(ch)
}
}
string result = sb.ToString();

How to get a char from an ASCII Character Code in C#

I'm trying to parse a file in C# that has field (string) arrays separated by ASCII character codes 0, 1 and 2 (in Visual Basic 6 you can generate these by using Chr(0) or Chr(1) etc.)
I know that for character code 0 in C# you can do the following:
char separator = '\0';
But this doesn't work for character codes 1 and 2?
Two options:
char c1 = '\u0001';
char c1 = (char) 1;
You can simply write:
char c = (char) 2;
or
char c = Convert.ToChar(2);
or more complex option for ASCII encoding only
char[] characters = System.Text.Encoding.ASCII.GetChars(new byte[]{2});
char c = characters[0];
It is important to notice that in C# the char type is stored as Unicode UTF-16.
From ASCII equivalent integer to char
char c = (char)88;
or
char c = Convert.ToChar(88)
From char to ASCII equivalent integer
int asciiCode = (int)'A';
The literal must be ASCII equivalent. For example:
string str = "Xสีน้ำเงิน";
Console.WriteLine((int)str[0]);
Console.WriteLine((int)str[1]);
will print
X
3626
Extended ASCII ranges from 0 to 255.
From default UTF-16 literal to char
Using the Symbol
char c = 'X';
Using the Unicode code
char c = '\u0058';
Using the Hexadecimal
char c = '\x0058';

Categories