ANSI character handling by C and C#

ANSI character handling by C and C# - c#

I have a character '¿'. If I cast it with integer in C, result is -61 and same casting in C#, result is 191. Can someone explain me the reason.
C Code
char c = '¿';
int I = (int)c;
Result I = -62
C# Code
char c = '¿';
int I = (int)c;
Result I = 191

This is how singed/unsigned numbers are represented and converted.
It looks like C compiler's default in your case use signed byte as underlying type for char (since you are note explicitly specifying unsigend char compiler's default is used, See - Why is 'char' signed by default in C++? ).
So 191 (0xBF) as signed byte means negative number (most significant bit is 1) - -65.
If you'd use unsigned char value would stay positive as you expect.
If your compiler would you wider type for char (i.e. short) that 191 would stay as positive 191 irrespective of whether or not char is signed or not.
In C# where it always unsigned - see MSDN char:
Type: char
Range: U+0000 to U+FFFF
So 191 will always convert to to int as you expect.

Related

Is there a way to convert Char into Int without using numbers?

The title says everything. I'm trying to convert char into int in visual studio.
I have already tried this:
int a;
a = (int)x;
System.Console.WriteLine(a);
but it's not giving me anything besides this(from trying to understand the code):
114117105

This will just work:
//a char is a 16-bit numerical value
//usually used for the representation of characters
//here I assign 'a' to it; keep in mind; 'a' has also a numeric representation
//i.e.: 97
char x = 'a';
int a;
//the `int a` is an integer and, through a cast, receives the numeric value
//besides the bit-width (debatable) the data contents are the same.
//If we assign it to an `int` only the "purpose" c.q. representation will change
a = (int)x;
//since we put in an `int` a number will be shown (because of it's purpose)
//if we put in x, then `a` will be shown.
System.Console.WriteLine(a);
Output
97
As you have understand by now; a string, is an array of chars.
Therefore a string is hard to represent by a single number, because it is 2 dimensional.
It would be the same as saying, convert: 0,4,43434,878728,3477,3.14159265 to a single number.
https://dotnetfiddle.net/qSYUdP
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/char
On why the output for a is 97; you can look it up in the character table, e.g.: ascii.
Please note that the actual character that is outputted is determined by the chosen font/character table. For most fonts the ASCII is implemented, but it's not guaranteed. So, 97 will not always produce a.

Can't assign 32bit hexadecimal value to integer

First, this question has related posts:
Why Int32 maximum value is 0x7FFFFFFF?
However, I want to know why the hexadecimal value is always treated as an unsigned quantity.
See the following snippet:
byte a = 0xFF; //No error (byte is an unsigned type).
short b = 0xFFFF; //Error! (even though both types are 16 bits).
int c = 0xFFFFFFFF; //Error! (even though both types are 32 bits).
long d = 0xFFFFFFFFFFFFFFFF; //Error! (even though both types are 64 bits).
The reason for the error is because the hexadecimal values are always treated as unsigned values, regardless of what data-type they are stored as. Hence, the value is 'too large' for the data-type described.
For instance, I expected:
int c = 0xFFFFFFFF;
To store the value:
-1
And not the value:
4294967295
Simply because int is a signed type.
So, why is it that the hexadecimal values are always treated as unsigned even if the sign type can be inferred by the data-type used to store them?
How can I store these bits into these data-types without resorting to the use of ushort, uint, and ulong?
In particular, how can I achieve this for the long data-type considering I cannot use a larger signed data-type?

What's going on is that a literal is intrinsically typed. 0.1 is a double, which is why you can't say float f = 0.1. You can cast a double to a float (float f = (float)0.1), but you may lose precision. Similarly, the literal 0xFFFFFFFF is intrinsically a uint. You can cast it to an int, but that's after it has been interpreted by the compiler as a uint. The compiler doesn't use the variable to which you are assigning it to figure out its type; its type is defined by what sort of literal it is.

They are treated as unsigned numbers, as that is what the language specification says to do

Which datatype is used for unsigned char in C#?

I have a c++ application. In that application one of the function is returning unsigned char value. I want to convert that function into C# code. But, I don't have enough knowledge about c++ programming. I want to know what datatype would be placed against c++ unsigned char in C#.
The c++ function is look like this
unsigned char getValue(string s)
{
//SOME CODE HERE
}

The equivalent of unsigned char in C# is byte.
byte getValue(string s)
{
}
Second option: if you use unsigned char as a character storage, you should use char:
char getValue(string s)
{
}
You have to remember, that C++ treats characters and 8-byte values equally. So, for instance in C++:
'A' + 'B' == 65 + 66 == 65 + 'B' == 'A' + 66
You have to check from the context, whether the unsigned char is a character or a number. From string being passed as a parameter, one can guess, that the function processes characters so mostly probably you want a char C# type.

You are looking for the byte type. Also, this question has additional mappings if you need those.
If your C# Code is supposed to treat the value as a character, then the char type is what you want. The reason why we have been suggesting byte is because we are assuming you want to treat the value as an 8-bit integer.

C++ unsigned char = C# byte
C++ char = C# sbyte

Use a char. A char in C# is neither signed or unsigned. It is 16 bit value.
If you do not want to retain the character, but the actual binary number, then you should choose the byte, that is a unsigned integer number, just like the unsigned char in c++.
See also: What is an unsigned char?

In C# can casting a Char to Int32 produce a negative value?

Or is it always guaranteed to be positive for all possible Chars?

It's guaranteed to be non-negative.
char is an unsigned 16-bit value.
From section 4.1.5 of the C# 4 spec:
The char type represents unsigned 16-bit integers with values between 0 and 65535. The set of possible values for the char type corresponds to the Unicode character set. Although char has the same representation as ushort, not all operations permitted on one type are permitted on the other.

Since the range of char is U+0000 to U+ffff, then a cast to an Int32 will always be positive.

Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is
stored in a Char structure.
Char Structure - MSDN

See Microsoft's documentation
There you can see, that Char is a 16 bit value in the range of U+0000 to U+ffff. If you cast it to a Int32, there should be no negative value.

char can be inplicitly converted to ushort and ushort range is 0 to 65,535 so its always positive

Different values when ANDING and assigned uints to a char array in C++ and C#

I have a very simple problem that is giving me a really big headache in that I am port a bit of code from C++ into C# and for a very simple operation I am getting totally different results:-
C++
char OutBuff[25];
int i;
unsigned int SumCheck = 46840;
OutBuff[i++] = SumCheck & 0xFF; //these 2 ANDED make 248
The value written to the char array is -8
C#
char[] OutBuff = new char[25];
int i;
uint SumCheck = 46840;
OutBuff[i++] = (char)(sumCheck & 0xFF); //these 2 ANDED also make 248
The value written to the char array is 248.
Interestingly they are both the same characters, so this may be something to do with the format of a char array in C++ and C# - but ultimately I would be grateful if someone could give me a definitive answer.
Thanks in advance for any help.
David

Its overflow in C++, and no overflow in C#.
In C#, char is two byte. In C++, char is one byte!
So in C#, there is no overflow, and the value is retained. In C++, there is integral overflow.
Change the data type from char to uint16_t or unsigned char (in C++), you will see same result. Note that unsigned char can have a value of 248, without overflow. It can have value upto 255, in fact.

Maybe you should be using byte or sbyte instead of char. (char is only to store text chars and the actual binary serialization for char is not the same as in c++. char allows us to store characters without worrying about character byte width.)

A C# char is actually 16 bits, while a C++ char is usually 8 bits (a char is exactly 8 bits on Visual C++). So you're actually overflowing the integer in the C++ code, but the C# code does not overflow, since it holds more bits, and therefore has a bigger integer range.
Notice that 248 is outside the range of a signed char (-128 to 127). That should give you a hint that C#'s char might be bigger than 8 bits.
You're probably meant to use C#'s sbyte, (the closest equivalent to Visual C++'s char) if you want to preserve the behavior. Although you may want to recheck the code code since there's an overflow occurring in the C++ implementation.

As everyone has stated, in C# a char is 16 bits while in C++ it is usually 8 bits.
-8 and 248 in binary both (essentially) look like this:
11111000
Because a char in C++ is usually 8 bits (which is in fact your case), the result is -8. In C#, the value looks like this:
00000000 11111000
Which is 16 bits and becomes 248.

2's complement representation of -8 is the same as the binary represenation of 248 (unsigned)
So the binary representation is the same in both cases. The c++ is interpreted as an int8 result and in c# it's simply interpreted as a positive integer (int is 32 bit an truncating to 16 by casting to char doesn't affect the sign in this case)

The difference between -8 and 248 is all in how you interpret the data. They are stored exactly the same (0xF8). In C++, the default char type is 'signed'. So, 0xF8 = -8. If you change the data type to 'unsigned char', it will be interpreted as 248. VS also has a compiler option to make 'char' default to 'unsigned'.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

ANSI character handling by C and C# - c#

I have a character '¿'. If I cast it with integer in C, result is -61 and same casting in C#, result is 191. Can someone explain me the reason. C Code char c = '¿'; int I = (int)c; Result I = -62 C# Code char c = '¿'; int I = (int)c; Result I = 191

Related

Is there a way to convert Char into Int without using numbers?

Can't assign 32bit hexadecimal value to integer

Which datatype is used for unsigned char in C#?

In C# can casting a Char to Int32 produce a negative value?

Different values when ANDING and assigned uints to a char array in C++ and C#

Categories

Resources