C# Build hexadecimal notation string - c#

How do I build an escape sequence string in hexadecimal notation.
Example:
string s = "\x1A"; // this will create the hex-value 1A or dec-value 26
I want to be able to build strings with hex-values between 00 to FF like this (in this example 1B)
string s = "\x" + "1B"; // Unrecognized escape sequence
Maybe there's another way of making hexadecimal strings...

Please try to avoid the \x escape sequence. It's difficult to read because where it stops depends on the data. For instance, how much difference is there at a glance between these two strings?
"\x9Good compiler"
"\x9Bad compiler"
In the former, the "\x9" is tab - the escape sequence stops there because 'G' is not a valid hex character. In the second string, "\x9Bad" is all an escape sequence, leaving you with some random Unicode character and " compiler".
I suggest you use the \u escape sequence instead:
"\u0009Good compiler"
"\u0009Bad compiler"
(Of course for tab you'd use \t but I hope you see what I mean...)
This is somewhat aside from the original question of course, but that's been answered already :)

You don't store hexadecimal values in strings.
You can, but it would just be that, a string, and would have to be cast to an integer or a byte to actually read its value.
You can assign a hexadecimal value as a literal to an int or a byte though:
Byte value = 0x0FF;
int value = 0x1B;
So, its easily possible to pass an hexadecimal literal into your string:
string foo = String.Format("{0} hex test", 0x0BB);
Which would create this string "126 hex test".
But I don't think that's what you wanted?

There's an '\u' escape code for hexadecimal 16 bits unicode character codes.
Console.WriteLine( "Look, I'm so happy : \u263A" );

Related

Is there a way to differentiate a string argument between non-hexadecimal and a hexadecimal?

Let's say we have the following signature
void doSomething(string s)
When the user calls the function, they can call
doSomething("hello") or doSomething("\x15\x3C\xFF")
Is there a way to tell when the argument is the second form, a hexadecimal value?
I want to do something like
if(isHex(s))
// do this
else
// do that
No. This is not possible. To the runtime environment, a string is essentially just an array of characters (which is essentially just a collection of bytes). It has no idea how those characters were originally represented either in plain text or escaped sequences of hexadecimal.
You can use regex in order to check for valid hex strings. But in order to do this you must provide the string in hex notation as is, i.e. without C#'s interpretation and transformation into a normal string. Use a verbatim string (introduced by a "#") for this:
string s = #"\x15\x3C\xFF";
In verbatim strings, the backslashes are not interpreted as escape characters by c#. But the downside of this is that you are not getting the intended resulting string any more, of course.
public static bool IsHexString(string s)
{
return Regex.IsMatch(s, #"^(\\x[0-9A-F]{2})+$");
}
Explanation of the regular expression:
^ beginning of string.
\\ escaped backslash ("\"). Not a C# escape here, but a regex escape.
x the letter "x".
[0-9A-F]{2} two consecutive hex digits.
(...)+ at least one occurence of a hex number.
$ end of line.

Reformatting hex string input

I'm trying to parse a user-entered hex string so that I can emit a correctly formatted version to disk.
The "correct" format, in this case, looks like A1 F5 E1 C9 - space separated bytes with uppercase hex letters. However, the user input might not have spaces (A1F5E1C9), might have line breaks (A1 F5\nE1 C9), might have leading or trailing space (\nA1 F5 E1 C9\n\n\n), and might have dashes instead of spaces (A1-F5-E1-C9). It might have any combination of those variations, as well. Some of the numbers this will be used on are public keys, which can be quite long.
How can I handle reformatting this? The two semi-solutions I've been able to come up with so far are
BigInteger.Parse(value.Trim()
.Replace(" ", "")
.Replace(#"\n", "")
.Replace(#"\r", ""),
NumberStyles.HexNumber).ToString("X2");
which still doesn't produce a spaced-out string, or
string.Join(" ", Regex.Matches(a, #"([0-9A-Fa-f]{2})")
.Cast<Match>()
.Select(x => x.Captures[0].Value.ToUpper()))
which does work, but feels like it has a lot of extraneous overhead (Regex, LINQ).
Is the second method actually the best way to do this? Is there something obvious I'm overlooking?
I don't know how long the hex strings can be in your case, but you can convert your string (after cleaning it out of not needed characters) to byte array and use BitConverter class to convert it to proper string.
It is described e.g. here:
How do you convert Byte Array to Hexadecimal String, and vice versa?
BitConverter class is described here: https://msdn.microsoft.com/en-us/library/system.bitconverter(v=vs.110).aspx

String must be exactly one character long

I have what I think is an easy problem. For some reason the following code generates the exception, "String must be exactly one character long".
int n = 0;
foreach (char letter in charMsg)
{
// Get the integral value of the character.
int value = Convert.ToInt32(letter);
// Convert the decimal value to a hexadecimal value in string form.
string hexOutput = String.Format("{0:X}", value);
//Console.WriteLine("Hexadecimal value of {0} is {1}", letter, hexOutput);
charMsg[n] = Convert.ToChar(hexOutput);
n++;
}
The exception occurs at the charMsg[n] = Convert.ToChar(hexOutput); line. Why does it happen? When I check the values of CharMsg it seems to contain all of them properly, yet still throws an error at me.
UPDATE: I've solved this problem, it was my mistake. Sorry for bothering you.
OK, this was a really stupid mistake on my part. Point is, with my problem I'm not even supposed to do this as hex values clearly won't help me in any way.
What I am trying to do it to encrypt a message in an image. I've already encrypted the length of said message in last digits on each color channel of first pixel. Now I'm trying to put the very message in there. I lookt here: http://en.wikipedia.org/wiki/ASCII and said to myself without thinking that usung hexes would be a good idea. Can't belive I thought that.
Convert.ToChar( string s ), per the documentation requires a single character string, otherwise it throws a FormatException as you've noted. It is a rough, though more restrictive, equivalent of
public char string2char( string s )
{
return s[0] ;
}
Your code does the following:
Iterates over all the characters in some enumrable collection of characters.
For each such character, it...
Converts the char to an int. Hint: a char is an integral type: its an unsigned 16-bit integral value.
converts that value to a string containing a hex representation of the character in question. For most characters, that string will be at least two character in length: for instance, converting the space character (' ', 0x20) this way will give you the string "20".
You then try to convert that back to a char and replace the current item being iterated over. This is where your exception is thrown. One thing you should note here is that altering a collection being enumerated is likely to cause the enumerator to throw an exception.
What exactly are you trying to accomplish here. For instance, given a charMsg that consist of 3 characters, 'a', 'b' and 'c', what should happen. A clear problem statement helps us to help you.
Since printable unicode characters can be anywhere in range from 0x0000 to 0xFFFF, your hexOutput variable can hold more than one character - this is why error is thrown.
Convert.ToChar(string) would always check length a of string, and if it is not equal to 1 - it would throw. So it would not convert string 0x30 to hexadecimal number, and then to ascii representation, symbol 0.
Can you elaborate on what you are trying to archieve ?
Your hexOutput is a string, and I'm assuming charMsg is a character array. Suppose the first element in charMsg is 'p', or hex value 70. The documentation for Convert.ToChar(string) says it'll use just the first character of the string ('7'), but it's wrong. It'll throw this error. You can test this with a static example, like charMsg[n] = Convert.ToChar("70");. You'll get the same error.
Are you trying to replace characters with hex values? If so, you might try using a StringBuilder object instead of your array assignments.
Convert.ToChar(string) if it is empty string lead this error. instead use cchar()

How do I create a string with a surrogate pair inside of it?

I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair which will actually cause the string reversal to fail. How does one actually go about creating a string with a surrogate pair in it so that I can see the failure myself?
The simplest way is to use \U######## where the U is capital, and the # denote exactly eight hexadecimal digits. If the value exceeds 0000FFFF hexadecimal, a surrogate pair will be needed:
string myString = "In the game of mahjong \U0001F01C denotes the Four of circles";
You can check myString.Length to see that the one Unicode character occupies two .NET Char values. Note that the char type has a couple of static methods that will help you determine if a char is a part of a surrogate pair.
If you use a .NET language that does not have something like the \U######## escape sequence, you can use the method ConvertFromUtf32, for example:
string fourCircles = char.ConvertFromUtf32(0x1F01C);
Addition: If your C# source file has an encoding that allows all Unicode characters, like UTF-8, you can just put the charater directly in the file (by copy-paste). For example:
string myString = "In the game of mahjong 🀜 denotes the Four of circles";
The character is UTF-8 encoded in the source file (in my example) but will be UTF-16 encoded (surrogate pairs) when the application runs and the string is in memory.
(Not sure if Stack Overflow software handles my mahjong character correctly. Try clicking "edit" to this answer and copy-paste from the text there, if the "funny" character is not here.)
The term "surrogate pair" refers to a means of encoding Unicode characters with high code-points in the UTF-16 encoding scheme (see this page for more information);
In the Unicode character encoding, characters are mapped to values between 0x000000 and 0x10FFFF. Internally, a UTF-16 encoding scheme is used to store strings of Unicode text in which two-byte (16-bit) code sequences are considered. Since two bytes can only contain the range of characters from 0x0000 to 0xFFFF, some additional complexity is used to store values above this range (0x010000 to 0x10FFFF).
This is done using pairs of code points known as surrogates. The surrogate characters are classified in two distinct ranges known as low surrogates and high surrogates, depending on whether they are allowed at the start or the end of the two-code sequence.
Try this yourself:
String surrogate = "abc" + Char.ConvertFromUtf32(Int32.Parse("2A601", NumberStyles.HexNumber)) + "def";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
or this, if you want to stick with the blog example:
String surrogate = "Les Mise" + Char.ConvertFromUtf32(Int32.Parse("0301", NumberStyles.HexNumber)) + "rables";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
nnd then check the string values with the debugger. Jon Skeet is damn right... strings and dates seem easy but they are absolutely NOT.

How to recognize if a string contains unicode chars?

I have a string and I want to know if it has unicode characters inside or not.
(if its fully contains ASCII or not)
How can I achieve that?
Thanks!
If my assumptions are correct you wish to know if your string contains any "non-ANSI" characters. You can derive this as follows.
public void test()
{
const string WithUnicodeCharacter = "a hebrew character:\uFB2F";
const string WithoutUnicodeCharacter = "an ANSI character:Æ";
bool hasUnicode;
//true
hasUnicode = ContainsUnicodeCharacter(WithUnicodeCharacter);
Console.WriteLine(hasUnicode);
//false
hasUnicode = ContainsUnicodeCharacter(WithoutUnicodeCharacter);
Console.WriteLine(hasUnicode);
}
public bool ContainsUnicodeCharacter(string input)
{
const int MaxAnsiCode = 255;
return input.Any(c => c > MaxAnsiCode);
}
Update
This will detect for extended ASCII. If you only detect for the true ASCII character range (up to 127), then you could potentially get false positives for extended ASCII characters which does not denote Unicode. I have alluded to this in my sample.
If a string contains only ASCII characters, a serialization + deserialization step using ASCII encoding should get back the same string
so a one liner check in c# could look like..
String s1="testभारत";
bool isUnicode= System.Text.ASCIIEncoding.GetEncoding(0).GetString(System.Text.ASCIIEncoding.GetEncoding(0).GetBytes(s1)) != s1;
ASCII defines only character codes in the range 0-127. Unicode is explicitly defined such as to overlap in that same range with ASCII. Thus, if you look at the character codes in your string, and it contains anything that is higher than 127, the string contains Unicode characters that are not ASCII characters.
Note, that ASCII includes only the English alphabet. Thus, if you (for whatever reason) need to apply that same approach to strings that might contain accented characters (Spanish text for example), ASCII is not sufficient and you need to look for another differentiator.
ANSI character set [*] does extends the ASCII characters with the aforementioned accented Latin characters in the range 128-255. However, Unicode does not overlap with ANSI in that range, so technically an Unicode string might contain characters that are not part of ANSI, but have the same character code (specifically in the range 128-159, as you can see from the table I linked to).
As for the actual code to do this, #chibacity answer should work, although you should modify it to cover strict ASCII, because it won't work for ANSI.
[*] Also known as Latin 1 Windows (Win-1252)
As long as it contains characters, it contains Unicode characters.
From System.String:
Represents text as a series of Unicode
characters.
public static bool ContainsUnicodeChars(string text)
{
return !string.IsNullOrEmpty(text);
}
You normally have to worry about different Unicode encodings when you have to:
Encode a string into a stream of bytes with a particular encoding.
Decode a string from a stream of bytes with a particular encoding.
Once you're into string land though, the encoding that the string was originally represented with, if any, is irrelevant.
Each character in a string is defined
by a Unicode scalar value, also called
a Unicode code point or the ordinal
(numeric) value of the Unicode
character. Each code point is encoded
by using UTF-16 encoding, and the
numeric value of each element of the
encoding is represented by a Char
object.
Perhaps you might also find these questions relevant:
How can you strip non-ASCII characters from a string? (in C#)
C# Ensure string contains only ASCII
And this article by Jon Skeet: Unicode and .NET
This is another solution without using lambda expresions. It is in VB.NET but you can convert it easily to C#:
Public Function ContainsUnicode(ByVal inputstr As String) As Boolean
Dim inputCharArray() As Char = inputstr.ToCharArray
For i As Integer = 0 To inputCharArray.Length - 1
If CInt(AscW(inputCharArray(i))) > 255 Then Return True
Next
Return False
End Function

Categories