Convert a single hex character to its byte value in C# - c#

This will convert 1 hex character to its integer value, but needs to construct a (sub) string.
Convert.ToInt32(serializedString.Substring(0,1), 16);
Does .NET have a built-in way to convert a single hex character to its byte (or int, doesn't matter) value that doesn't involve creating a new string?

int value = "0123456789ABCDEF".IndexOf(char.ToUpper(sourceString[index]));
Or even faster (subtraction vs. array search), but no checking for bad input:
int HexToInt(char hexChar)
{
hexChar = char.ToUpper(hexChar); // may not be necessary
return (int)hexChar < (int)'A' ?
((int)hexChar - (int)'0') :
10 + ((int)hexChar - (int)'A');
}

Correct me if im wrong but can you simply use
Convert.ToByte(stringValue, 16);
as long as the stringValue represents a hex number? Isnt that the point of the base paramter?
Strings are immutable, I dont think there is a way to get the substring byte value of the char at index 0 without creating a new string

If you know the hex value is only a byte then just convert to an Int32 and then cast
var b = (byte)(Convert.ToInt32(serializedString, 16));

Sure you can get the hex value without ever needing to create another string. I'm not sure what it'll really gain you, performance wise, but since you asked, here's what you've requested.
public int FromHex(ref string hexcode, int index)
{
char c = hexcode[index];
switch (c)
{
case '1':
return 1;
case '2':
return 2;
case '3':
return 3;
case '4':
return 4;
case '5':
return 5;
case '6':
return 6;
case '7':
return 7;
case '8':
return 8;
case '9':
return 9;
case 'A':
case 'a':
return 0xa;
case 'B':
case 'b':
return 0xb;
case 'C':
case 'c':
return 0xc;
case 'D':
case 'd':
return 0xd;
case 'E':
case 'e':
return 0xe;
case 'F':
case 'f':
return 0xf;
case '0':
default:
return 0;
}
}
}

I am adding another answer just because no one mentioned this one. You can use the built-in Uri.FromHex method for converting a single character:
var b = (byte) System.Uri.FromHex('a'); // b = 10

Encoding.UTF8.GetBytes( serializedString.ToCharArray(), 0, 1)
Cheaper might be:
Encoding.UTF8.GetBytes( new char[]{ serializedString[0] }, 0, 1)
This will only add the interesting char to the char[] and not the entire string.

Related

Why does my switch work when it has single quotes?

So disclaimer, I am pretty new to C# and trying to learn the finer intricacies. I have a classwork assignment that I coded and it works, but I'm not sure why it works, and want to understand it.
Here is the code, Its not the full code, i just cut out the relevant parts:
int studentType = 0;
switch (studentType)
{
case 1:
studentType = '1';
WriteLine("What is your GPA?");
gpa = Convert.ToDouble(ReadLine());
break;
case 2:
studentType = '2';
WriteLine("What is the title of your thesis?");
thesis = ReadLine();
break;
case 3:
studentType = '3';
WriteLine("What is the title of your dissertation?");
dissertation = ReadLine();
break;
case 4:
break;
default:
WriteLine("Invalid option input.");
break;
}//Switch for student Type
As noted, the case command works perfectly fine like this. What I accidently did was initially put case 'x': and that ended up not working, so i deleted all the single quotes.
Here is the 2nd part, and why I am confused:
switch (studentType)
{
case '1':
case '4':
WriteLine($"GPA: {gpa:f2}");
break;
case '3':
WriteLine($"Dissertation title: {dissertation}");
break;
case '2':
WriteLine($"Thesis title: {thesis}");
break;
}//end of studentType switch
So originally I tried writing the case without the single quotes, but every time I ran 1, GPA never populated, so I tried to put in the single quotes, and it works, but I'm not sure why.
Since studentType is an integer, it would make sense for the first switch to be without single quotes, but how is the switch requiring single quotes?
I guess I can submit it as is as long as it works correctly, but I mainly want to understand whats going on.
Thanks for the help!
There's an implicit conversion from char to int, and a constant char expression can be used as a constant int expression, which is what you've got. The value of the int is the UTF-16 code unit associated with the char.
Here's another example of that:
const int X = 'x';
That's also why your assignment statements worked:
studentType = '1';
So this:
int value = ...;
switch (value)
{
case '1':
// ...
break;
}
is equivalent to:
int value = ...;
switch (value)
{
case 49:
// ...
break;
}
... because 49 is the UTF-16 code point associated with the character '1'.

Converting function to determine operator precedence from C# to C++

I need to convert some code from C# to C/C++. The effect of the function is to determine operator precedence for math evaluations.
private static int OperatorPrecedence(string strOp)
{
switch (strOp)
{
case "*":
case "/":
case "%": return 0;
case "+":
case "-": return 1;
case ">>":
case "<<": return 2;
case "<":
case "<=":
case ">":
case ">=": return 3;
case "==":
case "=":
case "!=": return 4;
case "&": return 5;
case "^": return 6;
case "|": return 7;
case "&&": return 8;
case "||": return 9;
}
throw new ArgumentException("Operator " + strOp + "not defined.");
}
I realize the numerous questions about strings in switch statements in C++, but that's not really what I'm asking. Obviously the switch(string) syntax is not legal in C++. I don't want to use it. I just need an efficient way to determine the precedence of the above operators short of initializing an entire map at the start of the program or large if-else chains (which is really just dancing around the swiitch statement).
Any ideas how I can determine operator precedence? Maybe a way to generate a unique code for each operator?
As specified in this C# answer a switch with a string is compiled into a dictionary lookup or a if-else chain depending on the amount of cases.
The dictionary type in C++ is std::map, you could use a static dictionary in a scope and then search in it.
So a straight equivalent, 1:1 conversion, would be something along those lines:
int OperatorPrecedence(const std::string& strOp)
{
static std::map<std::string, int> lookup = { {"*", 1}, {"/", 1} /* add more */ };
auto it = lookup.find(strOp);
if(it != lookup.end())
return it->second;
else
throw std::invalid_argument(std::string("Operator ") + strOp + "not defined");
}
The advantage of using a statically stored dictionary instead of one with automatic storage duration here is that the container does not to be initialized (allocations!) every time you ask for OperatorPrecedence.

Convert a string to int. Or Digit to text. ('1' to 'one', '3' to 'three', etc) In console Application?

I have a console application which asks a user to input a number using digits (1-9 and 0). I was wondering if there is a way where I can then convert that digit to a string of text.
Thanks.
I have found some code online (here), but I am unsure of how to implement most of it to a console app.
I'd write a function
string DigitToText(int digit)
{
if (digit < 0 || digit > 9)
{
throw new ArgumentOutOfRangeException(
"digit",
"digit must be between 0 and 9");
}
switch(digit)
{
case 0:
return "zero";
case 1:
return "one";
case 2:
return "two";
case 3:
return "three";
case 4:
return "four";
case 5:
return "five";
case 6:
return "six";
case 7:
return "seven";
case 8:
return "eight";
default:
return "nine";
}
}
Using a switch statement will save lots uf unnesscessary instantiation of arrays and although this may look verbose I think the resulting IL will be efficient.
The code you found there works irrelevant of the type of application. Just add the class there to your project and use it.

Rewrite IsHexString method with RegEx

I got a method that checks if a string is a valid hex string:
public bool IsHex(string value)
{
if (string.IsNullOrEmpty(value) || value.Length % 2 != 0)
return false;
return
value.Substring(0, 2) == "0x" &&
value.Substring(2)
.All(c => (c >= '0' && c <= '9') ||
(c >= 'a' && c <= 'f') ||
(c >= 'A' && c <= 'F'));
}
The rules are:
The expression must be composed of an even number of hexadecimal digits (0-9, A-F, a-f). The characters 0x must be the first two characters in the expression.
I'm sure it can be rewriten in regex in a much cleaner and more efficient way.
Could you help me out with that?
After you updated your question, the new regex that works for you should be:
^0x(?:[0-9A-Fa-f]{2})+$
Where I use (?: for non-capturing grouping for efficiency. The {2} means that you want two of the previous expression (i.e., two hex chars), the + means you want one or more hex characters. Note that this disallows 0x as a valid value.
Efficiency
"Oded" mentioned something about efficiency. I don't know your requirements, so I consider this more an exercise for the mind than anything else. A regex will make leaps as long as the smallest matching regex. For instance, trying my own regex on 10,000 variable input strings of size 50-5000 characters, all correct, it runs in 1.1 seconds.
When I try the following regex:
^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$
it runs about 40% faster, in 0.67 seconds. But be careful. Knowing your input is knowing how to write efficient regexes. For instance, if the regex fails, it will do a lot of back-tracking. If half of my input strings has the incorrect length, the running time explodes to approx 34 seconds, or 3000% (!), for the same input.
It becomes even trickier if most input strings are large. If 99% of your input is of valid length, all are > 4130 chars and only a few are not, writing
^0x(?:[0-9A-Fa-f]{4096})+^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$
is efficient and boosts time even more. However, if many have incorrect length % 2 = 0, this is counter-efficient because of back-tracking.
Finally, if most your strings satisfy the even-number-rule, and only some or many strings contain a wrong character, the speed goes up: the more input that contains a wrong character, the better the performance. That is, because when it finds an invalid character it can immediately break out.
Conclusion: if your input is mixed small, large, wrong character, wrong count your fastest approach would be to use a combination of checking the length of the string (instantaneous in .NET) and use an efficient regex.
So, basically you want to check whether the number starts with 0x and continues with a (non-empty) sequence of 0-9 and/or A-F. That can be specified as a regular expression easily:
return RegEx.IsMatch(value, "^0x[0-9A-Fa-f]+$")
I'm not sure why you do the value.Length % 2 != 0 check... isn't "0x1" a valid hexadecimal number? In addition, my function returns false on "0x", whereas yours would return true. If you want to change that, replace + (= one or many) with * (= zero or many) in the regular expression.
EDIT: Now that you've justified your "even number" requirement, I suggest you use Abel's RegEx. If you do that, I suggest that you call your method IsMsSqlHex or something like this to document that it does not follow the "usual" hex rules.
Diatribe: If you are at all concerned about speed forget about Regex. Regex is a NFA and is as such, in most cases, slower than a DFA or hand-written parser.
Ignoring that you asked for Regex here is something that would likely be more efficient (even though your implementation is probably fine - it does allocate strings):
static bool IsHex(string value)
{
if (string.IsNullOrEmpty(value) || value.Length < 3)
return false;
const byte State_Zero = 0;
const byte State_X = 1;
const byte State_Value = 2;
var state = State_Zero;
for (var i = 0; i < value.Length; i++)
{
switch (value[i])
{
case '0':
{
// Can be used in either Value or Zero.
switch (state)
{
case State_Zero: state = State_X; break;
case State_X: return false;
case State_Value: break;
}
}
break;
case 'X': case 'x':
{
// Only valid in X.
switch (state)
{
case State_Zero: return false;
case State_X: state = State_Value; break;
case State_Value: return false;
}
}
break;
case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9':
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
{
// Only valid in Value.
switch (state)
{
case State_Zero: return false;
case State_X: return false;
case State_Value: break;
}
}
break;
default: return false;
}
}
return state == State_Value;
}
If I can garner correctly as to what you are trying to achieve maybe this function will suite your needs better:
static bool ParseNumber(string value, out int result)
{
if (string.IsNullOrEmpty(value))
{
result = 0;
return false;
}
if (value.StartsWith("0x"))
return int.TryParse(value.Substring(2), NumberStyles.AllowHexSpecifier, null, out result);
else
return int.TryParse(value, NumberStyles.Integer, CultureInfo.InvariantCulture, out result);
}
Just for kicks I went and profiled:
Abel's Regex without static readonly caching as I described on Heinzi's answer.
Abel's Regex with static readonly caching.
My implementation.
Results on my laptop (Release/no debugger):
Regex with no compiled/cached Regex took 8137ms (2x cached, 20x hand-written)
Regex with compiled/cached Regex took 3463ms (8x hand-written)
Hand-written took 397ms (1x)

using a negative number in the default section of a switch statement as a char

I'm not looking for any specific code examples, but could someone explain why I can't get the '-1' to function at the end of this switch statement? It keeps saying that there are "too many literals" for type char. (something close to that). Would I have to convert this to another type?
Thanks for any help, and please, just explain without giving code. I would love to learn this by hands on experiance :D
Convert 7 char passed from ProcessInput() by reference to upper case
Use switch statement to translate char into their corresponding digits (case statement for each digit and each valid uppercase letter)
**TROUBLES WITH THIS PART***Write default case that returns error code (-1) for invalid letters
If no invalide letters, return 0
static void ToDigit(ref char digit)
{
digit = Char.ToUpper(digit);
char result;
switch (digit)
{
case '0': result = '0';
break;
case '1': result = '1';
break;
case '2':
case 'A':
case 'B':
case 'C': result = '2';
break;
case '3':
case 'D':
case 'E':
case 'F': result = '3';
break;
case '4':
case 'G':
case 'H':
case 'I': result = '4';
break;
case '5':
case 'J':
case 'K':
case 'L': result = '5';
break;
case '6':
case 'M':
case 'N':
case 'O': result = '6';
break;
case '7':
case 'P':
case 'Q':
case 'R':
case 'S': result = '7';
break;
case '8':
case 'T':
case 'U':
case 'V': result = '8';
break;
case '9':
case 'W':
case 'X':
case 'Y':
case 'Z': result = '9';
break;
//Says I can't enter -1 as char "too many characters in character literal
default: result = 'e';
break;
}
digit = result;
}
A char is, as the name implies, a single character. A group of characters all "strung together" is called a string in C#.
If you want the integer value -1 as a char then you can do that by saying unchecked((char)(-1)) (*) but you should be aware that this is a very bad idea. I assume that this is your assignment:
Write default case that returns error code (-1) for invalid letters.
That's not how things work in C#; returning a "bad" value to indicate failure is a "worst practice" -- it is characteristic of 1970's style C programming, but not C#.
The right thing to do here is to either (1) have no error cases at all; if there is no upper case form then just don't transform the character at all, or (2) throw an exception if the input is bad, or (3) return a nullable char, and return null for the "bad" value.
Also, the fact that your program takes a ref rather than returning a value is deeply suspicious. A ToDigit method should be computing and returning result not mutating a variable.
I think whatever course of study you are taking was written decades ago, originally targetted a different language entirely, and was never updated to use modern best practices. I would seriously question the value of such materials.
Always say (T)(-1) in C# when casting the constant -1 to the type T, rather than (T)-1. If you write it the latter way, the reader can get confused about whether you mean "subtract one from T" or "cast negative one to type T".
Because '-1' isn't a char, it's two separate characters. '-' and '1'.
Since it talks about returning 0 on success, I assume that the resulting character, and the return value are different things. So they probably want something like this:
static int ToDigit(ref char digit)
{
switch (Char.ToUpperInvariant(digit))
{
case x:
digit=y;
return 0;
...
default:
return -1;
}
}
A few notes:
I'm using ToUpperInvariant instead of ToUpper, since to upper uses the current locale, and that can lead to strange effects. For example your code wouldn't accept an i when run on a Turkish computer.
I'm leaving digit untouched in the error case.
Using int to represent success/error is a bad idea. Should at least be a bool.
You should make result the type int, and return it back to the caller. The caller could then compare it to -1, and then quickly convert back to digit if it's not -1 by adding '0':
int result;
switch (digit) {
// Assign result here
}
char resDigit;
if (result < 0) {
// bad digit
} else {
resDigit = result + '0';
}
As a side note, you can replace your switch with a lookup in a long string of characters:
string lookup = "0 1 2 ABC3 DEF4 GHI5 JKL6 MNO7PQRS8 TUV9WXYZ";
int pos = lookup.IndexOf(char.ToUpper(digit));
if (pos < 0) {
// bad digit
} else {
result = '0' + pos/5;
}
A char can contain only one character, thus '-1' does not work. (char)(-1) is not a valid character value.
My suggestion is to simply use '?' as default (or error) value.

Categories