Previously today I was trying to add two ushorts and I noticed that I had to cast the result back to ushort. I thought it might've become a uint (to prevent a possible unintended overflow?), but to my surprise it was an int (System.Int32).
Is there some clever reason for this or is it maybe because int is seen as the 'basic' integer type?
Example:
ushort a = 1;
ushort b = 2;
ushort c = a + b; // <- "Cannot implicitly convert type 'int' to 'ushort'. An explicit conversion exists (are you missing a cast?)"
uint d = a + b; // <- "Cannot implicitly convert type 'int' to 'uint'. An explicit conversion exists (are you missing a cast?)"
int e = a + b; // <- Works!
Edit: Like GregS' answer says, the C# spec says that both operands (in this example 'a' and 'b') should be converted to int. I'm interested in the underlying reason for why this is part of the spec: why doesn't the C# spec allow for operations directly on ushort values?
The simple and correct answer is "because the C# Language Specification says so".
Clearly you are not happy with that answer and want to know "why does it say so". You are looking for "credible and/or official sources", that's going to be a bit difficult. These design decisions were made a long time ago, 13 years is a lot of dog lives in software engineering. They were made by the "old timers" as Eric Lippert calls them, they've moved on to bigger and better things and don't post answers here to provide an official source.
It can be inferred however, at a risk of merely being credible. Any managed compiler, like C#'s, has the constraint that it needs to generate code for the .NET virtual machine. The rules for which are carefully (and quite readably) described in the CLI spec. It is the Ecma-335 spec, you can download it for free from here.
Turn to Partition III, chapter 3.1 and 3.2. They describe the two IL instructions available to perform an addition, add and add.ovf. Click the link to Table 2, "Binary Numeric Operations", it describes what operands are permissible for those IL instructions. Note that there are just a few types listed there. byte and short as well as all unsigned types are missing. Only int, long, IntPtr and floating point (float and double) is allowed. With additional constraints marked by an x, you can't add an int to a long for example. These constraints are not entirely artificial, they are based on things you can do reasonably efficient on available hardware.
Any managed compiler has to deal with this in order to generate valid IL. That isn't difficult, simply convert the ushort to a larger value type that's in the table, a conversion that's always valid. The C# compiler picks int, the next larger type that appears in the table. Or in general, convert any of the operands to the next largest value type so they both have the same type and meet the constraints in the table.
Now there's a new problem however, a problem that drives C# programmers pretty nutty. The result of the addition is of the promoted type. In your case that will be int. So adding two ushort values of, say, 0x9000 and 0x9000 has a perfectly valid int result: 0x12000. Problem is: that's a value that doesn't fit back into an ushort. The value overflowed. But it didn't overflow in the IL calculation, it only overflows when the compiler tries to cram it back into an ushort. 0x12000 is truncated to 0x2000. A bewildering different value that only makes some sense when you count with 2 or 16 fingers, not with 10.
Notable is that the add.ovf instruction doesn't deal with this problem. It is the instruction to use to automatically generate an overflow exception. But it doesn't, the actual calculation on the converted ints didn't overflow.
This is where the real design decision comes into play. The old-timers apparently decided that simply truncating the int result to ushort was a bug factory. It certainly is. They decided that you have to acknowledge that you know that the addition can overflow and that it is okay if it happens. They made it your problem, mostly because they didn't know how to make it theirs and still generate efficient code. You have to cast. Yes, that's maddening, I'm sure you didn't want that problem either.
Quite notable is that the VB.NET designers took a different solution to the problem. They actually made it their problem and didn't pass the buck. You can add two UShorts and assign it to an UShort without a cast. The difference is that the VB.NET compiler actually generates extra IL to check for the overflow condition. That's not cheap code, makes every short addition about 3 times as slow. But otherwise the reason that explains why Microsoft maintains two languages that have otherwise very similar capabilities.
Long story short: you are paying a price because you use a type that's not a very good match with modern cpu architectures. Which in itself is a Really Good Reason to use uint instead of ushort. Getting traction out of ushort is difficult, you'll need a lot of them before the cost of manipulating them out-weighs the memory savings. Not just because of the limited CLI spec, an x86 core takes an extra cpu cycle to load a 16-bit value because of the operand prefix byte in the machine code. Not actually sure if that is still the case today, it used to be back when I still paid attention to counting cycles. A dog year ago.
Do note that you can feel better about these ugly and dangerous casts by letting the C# compiler generate the same code that the VB.NET compiler generates. So you get an OverflowException when the cast turned out to be unwise. Use Project > Properties > Build tab > Advanced button > tick the "Check for arithmetic overflow/underflow" checkbox. Just for the Debug build. Why this checkbox isn't turned on automatically by the project template is another very mystifying question btw, a decision that was made too long ago.
ushort x = 5, y = 12;
The following assignment statement will produce a compilation error, because the arithmetic expression on the right-hand side of the assignment operator evaluates to int by default.
ushort z = x + y; // Error: conversion from int to ushort
http://msdn.microsoft.com/en-us/library/cbf1574z(v=vs.71).aspx
EDIT:
In case of arithmetic operations on ushort, the operands are converted to a type which can hold all values. So that overflow can be avoided. Operands can change in the order of int, uint, long and ulong.
Please see the C# Language Specification In this document go to section 4.1.5 Integral types (around page 80 in the word document). Here you will find:
For the binary +, –, *, /, %, &, ^, |, ==, !=, >, <, >=, and <=
operators, the operands are converted to type T, where T is the first
of int, uint, long, and ulong that can fully represent all possible
values of both operands. The operation is then performed using the
precision of type T, and the type of the result is T (or bool for the
relational operators). It is not permitted for one operand to be of
type long and the other to be of type ulong with the binary operators.
Eric Lipper has stated in a question
Arithmetic is never done in shorts in C#. Arithmetic can be done in
ints, uints, longs and ulongs, but arithmetic is never done in shorts.
Shorts promote to int and the arithmetic is done in ints, because like
I said before, the vast majority of arithmetic calculations fit into
an int. The vast majority do not fit into a short. Short arithmetic is
possibly slower on modern hardware which is optimized for ints, and
short arithmetic does not take up any less space; it's going to be
done in ints or longs on the chip.
From the C# language spec:
7.3.6.2 Binary numeric promotions
Binary numeric promotion occurs for the operands of the predefined +, –, *, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary numeric promotion implicitly converts both operands to a common type which, in case of the non-relational operators, also becomes the result type of the operation. Binary numeric promotion consists of applying the following rules, in the order they appear here:
· If either operand is of type decimal, the other operand is converted to type decimal, or a binding-time error occurs if the other operand is of type float or double.
· Otherwise, if either operand is of type double, the other operand is converted to type double.
· Otherwise, if either operand is of type float, the other operand is converted to type float.
· Otherwise, if either operand is of type ulong, the other operand is converted to type ulong, or a binding-time error occurs if the other operand is of type sbyte, short, int, or long.
· Otherwise, if either operand is of type long, the other operand is converted to type long.
· Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.
· Otherwise, if either operand is of type uint, the other operand is converted to type uint.
· Otherwise, both operands are converted to type int.
There is no reason that is intended. This is just an effect or applying the rules of overload resolution which state that the first overload for whose parameters there is an implicit conversion that fit the arguments, that overload will be used.
This is stated in the C# Specification, section 7.3.6 as follows:
Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators.
It goes on illustrating with an example:
As an example of numeric promotion, consider the predefined implementations of the binary * operator:
int operator *(int x, int y);
uint operator *(uint x, uint y);
long operator *(long x, long y);
ulong operator *(ulong x, ulong y);
float operator *(float x, float y);
double operator *(double x, double y);
decimal operator *(decimal x, decimal y);
When overload resolution rules (§7.5.3) are applied to this set of operators, the effect is to select the first of the operators for which implicit conversions exist from the operand types. For example, for the operation b * s, where b is a byte and s is a short, overload resolution selects operator *(int, int) as the best operator.
Your question is in fact, a bit tricky. The reason why this specification is part of the language is... because they took that decision when they created the language. I know this sounds like a disappointing answer, but that's just how it is.
However, the real answer would probably involve many context decision back in the day in 1999-2000. I am sure the team who made C# had pretty robust debates about all those language details.
...
C# is intended to be a simple, modern, general-purpose, object-oriented programming language.
Source code portability is very important, as is programmer portability, especially for those programmers already familiar with C and C++.
Support for internationalization is very important.
...
The quote above is from Wikipedia C#
All of those design goals might have influenced their decision. For instance, in the year 2000, most of the system were already native 32-bits, so they might have decided to limit the number of variable smaller than that, since it will be converted anyway on 32-bits when performing arithmetic operations. Which is generally slower.
At that point, you might ask me; if there is implicit conversion on those types why did they included them anyway? Well one of their design goals, as quoted above, is portability.
Thus, if you need to write a C# wrapper around an old C or C++ program you might need those type to store some values. In that case, those types are pretty handy.
That's a decision Java did not make. For instance, if you write a Java program that interacts with a C++ program in which way your are received ushort values, well Java only has short (which are signed) so you can't easily assign one to another and expect correct values.
I let you bet, next available type that could receive such value in Java is int (32-bits of course). You have just doubled your memory here. Which might not be a big deal, instead you have to instantiate an array of 100 000 elements.
In fact, We must remember that those decision are been made by looking at the past and the future in order provide a smooth transfer from one to another.
But now I feel that I am diverging of the initial question.
So your question is a good one and hopefully I was able to bring some answers to you, even though if I know that's probably not what you wanted to hear.
If you'd like, you could even read more about the C# spec, links below. There is some interesting documentation that might be interesting for you.
Integral types
The checked and unchecked operators
Implicit Numeric Conversions Table
By the way, I believe you should probably reward habib-osu for it, since he provided a fairly good answer to the initial question with a proper link. :)
Regards
Related
Previously today I was trying to add two ushorts and I noticed that I had to cast the result back to ushort. I thought it might've become a uint (to prevent a possible unintended overflow?), but to my surprise it was an int (System.Int32).
Is there some clever reason for this or is it maybe because int is seen as the 'basic' integer type?
Example:
ushort a = 1;
ushort b = 2;
ushort c = a + b; // <- "Cannot implicitly convert type 'int' to 'ushort'. An explicit conversion exists (are you missing a cast?)"
uint d = a + b; // <- "Cannot implicitly convert type 'int' to 'uint'. An explicit conversion exists (are you missing a cast?)"
int e = a + b; // <- Works!
Edit: Like GregS' answer says, the C# spec says that both operands (in this example 'a' and 'b') should be converted to int. I'm interested in the underlying reason for why this is part of the spec: why doesn't the C# spec allow for operations directly on ushort values?
The simple and correct answer is "because the C# Language Specification says so".
Clearly you are not happy with that answer and want to know "why does it say so". You are looking for "credible and/or official sources", that's going to be a bit difficult. These design decisions were made a long time ago, 13 years is a lot of dog lives in software engineering. They were made by the "old timers" as Eric Lippert calls them, they've moved on to bigger and better things and don't post answers here to provide an official source.
It can be inferred however, at a risk of merely being credible. Any managed compiler, like C#'s, has the constraint that it needs to generate code for the .NET virtual machine. The rules for which are carefully (and quite readably) described in the CLI spec. It is the Ecma-335 spec, you can download it for free from here.
Turn to Partition III, chapter 3.1 and 3.2. They describe the two IL instructions available to perform an addition, add and add.ovf. Click the link to Table 2, "Binary Numeric Operations", it describes what operands are permissible for those IL instructions. Note that there are just a few types listed there. byte and short as well as all unsigned types are missing. Only int, long, IntPtr and floating point (float and double) is allowed. With additional constraints marked by an x, you can't add an int to a long for example. These constraints are not entirely artificial, they are based on things you can do reasonably efficient on available hardware.
Any managed compiler has to deal with this in order to generate valid IL. That isn't difficult, simply convert the ushort to a larger value type that's in the table, a conversion that's always valid. The C# compiler picks int, the next larger type that appears in the table. Or in general, convert any of the operands to the next largest value type so they both have the same type and meet the constraints in the table.
Now there's a new problem however, a problem that drives C# programmers pretty nutty. The result of the addition is of the promoted type. In your case that will be int. So adding two ushort values of, say, 0x9000 and 0x9000 has a perfectly valid int result: 0x12000. Problem is: that's a value that doesn't fit back into an ushort. The value overflowed. But it didn't overflow in the IL calculation, it only overflows when the compiler tries to cram it back into an ushort. 0x12000 is truncated to 0x2000. A bewildering different value that only makes some sense when you count with 2 or 16 fingers, not with 10.
Notable is that the add.ovf instruction doesn't deal with this problem. It is the instruction to use to automatically generate an overflow exception. But it doesn't, the actual calculation on the converted ints didn't overflow.
This is where the real design decision comes into play. The old-timers apparently decided that simply truncating the int result to ushort was a bug factory. It certainly is. They decided that you have to acknowledge that you know that the addition can overflow and that it is okay if it happens. They made it your problem, mostly because they didn't know how to make it theirs and still generate efficient code. You have to cast. Yes, that's maddening, I'm sure you didn't want that problem either.
Quite notable is that the VB.NET designers took a different solution to the problem. They actually made it their problem and didn't pass the buck. You can add two UShorts and assign it to an UShort without a cast. The difference is that the VB.NET compiler actually generates extra IL to check for the overflow condition. That's not cheap code, makes every short addition about 3 times as slow. But otherwise the reason that explains why Microsoft maintains two languages that have otherwise very similar capabilities.
Long story short: you are paying a price because you use a type that's not a very good match with modern cpu architectures. Which in itself is a Really Good Reason to use uint instead of ushort. Getting traction out of ushort is difficult, you'll need a lot of them before the cost of manipulating them out-weighs the memory savings. Not just because of the limited CLI spec, an x86 core takes an extra cpu cycle to load a 16-bit value because of the operand prefix byte in the machine code. Not actually sure if that is still the case today, it used to be back when I still paid attention to counting cycles. A dog year ago.
Do note that you can feel better about these ugly and dangerous casts by letting the C# compiler generate the same code that the VB.NET compiler generates. So you get an OverflowException when the cast turned out to be unwise. Use Project > Properties > Build tab > Advanced button > tick the "Check for arithmetic overflow/underflow" checkbox. Just for the Debug build. Why this checkbox isn't turned on automatically by the project template is another very mystifying question btw, a decision that was made too long ago.
ushort x = 5, y = 12;
The following assignment statement will produce a compilation error, because the arithmetic expression on the right-hand side of the assignment operator evaluates to int by default.
ushort z = x + y; // Error: conversion from int to ushort
http://msdn.microsoft.com/en-us/library/cbf1574z(v=vs.71).aspx
EDIT:
In case of arithmetic operations on ushort, the operands are converted to a type which can hold all values. So that overflow can be avoided. Operands can change in the order of int, uint, long and ulong.
Please see the C# Language Specification In this document go to section 4.1.5 Integral types (around page 80 in the word document). Here you will find:
For the binary +, –, *, /, %, &, ^, |, ==, !=, >, <, >=, and <=
operators, the operands are converted to type T, where T is the first
of int, uint, long, and ulong that can fully represent all possible
values of both operands. The operation is then performed using the
precision of type T, and the type of the result is T (or bool for the
relational operators). It is not permitted for one operand to be of
type long and the other to be of type ulong with the binary operators.
Eric Lipper has stated in a question
Arithmetic is never done in shorts in C#. Arithmetic can be done in
ints, uints, longs and ulongs, but arithmetic is never done in shorts.
Shorts promote to int and the arithmetic is done in ints, because like
I said before, the vast majority of arithmetic calculations fit into
an int. The vast majority do not fit into a short. Short arithmetic is
possibly slower on modern hardware which is optimized for ints, and
short arithmetic does not take up any less space; it's going to be
done in ints or longs on the chip.
From the C# language spec:
7.3.6.2 Binary numeric promotions
Binary numeric promotion occurs for the operands of the predefined +, –, *, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary numeric promotion implicitly converts both operands to a common type which, in case of the non-relational operators, also becomes the result type of the operation. Binary numeric promotion consists of applying the following rules, in the order they appear here:
· If either operand is of type decimal, the other operand is converted to type decimal, or a binding-time error occurs if the other operand is of type float or double.
· Otherwise, if either operand is of type double, the other operand is converted to type double.
· Otherwise, if either operand is of type float, the other operand is converted to type float.
· Otherwise, if either operand is of type ulong, the other operand is converted to type ulong, or a binding-time error occurs if the other operand is of type sbyte, short, int, or long.
· Otherwise, if either operand is of type long, the other operand is converted to type long.
· Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.
· Otherwise, if either operand is of type uint, the other operand is converted to type uint.
· Otherwise, both operands are converted to type int.
There is no reason that is intended. This is just an effect or applying the rules of overload resolution which state that the first overload for whose parameters there is an implicit conversion that fit the arguments, that overload will be used.
This is stated in the C# Specification, section 7.3.6 as follows:
Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators.
It goes on illustrating with an example:
As an example of numeric promotion, consider the predefined implementations of the binary * operator:
int operator *(int x, int y);
uint operator *(uint x, uint y);
long operator *(long x, long y);
ulong operator *(ulong x, ulong y);
float operator *(float x, float y);
double operator *(double x, double y);
decimal operator *(decimal x, decimal y);
When overload resolution rules (§7.5.3) are applied to this set of operators, the effect is to select the first of the operators for which implicit conversions exist from the operand types. For example, for the operation b * s, where b is a byte and s is a short, overload resolution selects operator *(int, int) as the best operator.
Your question is in fact, a bit tricky. The reason why this specification is part of the language is... because they took that decision when they created the language. I know this sounds like a disappointing answer, but that's just how it is.
However, the real answer would probably involve many context decision back in the day in 1999-2000. I am sure the team who made C# had pretty robust debates about all those language details.
...
C# is intended to be a simple, modern, general-purpose, object-oriented programming language.
Source code portability is very important, as is programmer portability, especially for those programmers already familiar with C and C++.
Support for internationalization is very important.
...
The quote above is from Wikipedia C#
All of those design goals might have influenced their decision. For instance, in the year 2000, most of the system were already native 32-bits, so they might have decided to limit the number of variable smaller than that, since it will be converted anyway on 32-bits when performing arithmetic operations. Which is generally slower.
At that point, you might ask me; if there is implicit conversion on those types why did they included them anyway? Well one of their design goals, as quoted above, is portability.
Thus, if you need to write a C# wrapper around an old C or C++ program you might need those type to store some values. In that case, those types are pretty handy.
That's a decision Java did not make. For instance, if you write a Java program that interacts with a C++ program in which way your are received ushort values, well Java only has short (which are signed) so you can't easily assign one to another and expect correct values.
I let you bet, next available type that could receive such value in Java is int (32-bits of course). You have just doubled your memory here. Which might not be a big deal, instead you have to instantiate an array of 100 000 elements.
In fact, We must remember that those decision are been made by looking at the past and the future in order provide a smooth transfer from one to another.
But now I feel that I am diverging of the initial question.
So your question is a good one and hopefully I was able to bring some answers to you, even though if I know that's probably not what you wanted to hear.
If you'd like, you could even read more about the C# spec, links below. There is some interesting documentation that might be interesting for you.
Integral types
The checked and unchecked operators
Implicit Numeric Conversions Table
By the way, I believe you should probably reward habib-osu for it, since he provided a fairly good answer to the initial question with a proper link. :)
Regards
Could someone point me, why here:
Byte b = 100;
b = (Byte)(b+200);
I have to use explicit type conversion. But here
Byte b = 100;
b += 200;
I don't need to do this?
Does compiler generate different IL code for this two cases? And which case is better?
Because the standard permits it (see the second case below):
14.14.2 Compound assignment
An operation of the form x op= y is processed by applying binary operator overload resolution (§14.2.4) as if the operation was written x op y. Then,
If the return type of the selected operator is implicitly convertible to the type of x, the operation is evaluated as x = x op y, except that x is evaluated only once.
Otherwise, if the selected operator is a predefined operator, if the return type of the selected operator is explicitly convertible to the type of x, and if y is implicitly convertible to the type of x or the operator is a shift operator, then the operation is evaluated as x = (T)(x op y), where T is the type of x, except that x is evaluated only once.
Otherwise, the compound assignment is invalid, and a compile-time error occurs.
The IL code should be essentially identical in this case. Of course, if evaluating b has side effects, it will be evaluated twice in the b = (byte)b + 200 case and only once when using compound assignment.
This is a FAQ in the C# tag, hard to find the duplicate. The need for the cast is relevant first. The underlying reason is that the CLI only specifies a limited number of valid types for the Opcodes.Add IL instruction. Only operands of type Int32, Int64, Single, Double and IntPtr are supported. IntPtr is special as well, the C# language forbids using that one.
So the C# compiler has to use an implicit conversion to uplift the byte to a type that the operator supports. It will pick Int32 as the closest compatible type. The result of the addition is Int32. Which does not fit back into a byte without truncating the result, throwing away the extra bits. An obvious example is 255 + 1, the result is 256 in Int32 but doesn't fit a Byte and yields 0 when stored.
That's a problem, the language designers didn't like that truncation to happen without you explicitly acknowledging that you are aware of the consequences. A cast is required to convince the compiler that you're aware. Bit of a cop-out of course, you tend to produce the cast mechanically without thinking much about the consequences. But that makes it your problem, not Microsoft's :)
The rub was the += operator, a very nice operator to write condense code. Resembles the brevity of the var keyword. Rock and a hard place however, where do you put the cast? It just doesn't work so they punted the problem and allowed truncation without a cast.
Notable is the way VB.NET works, it doesn't require a cast. But it gives a guarantee that C# doesn't provide by default, it will generate an OverflowException when the result doesn't fit. Pretty nice, but that check doesn't come for free.
Designing clean languages is a very hard problem. The C# team did an excellent job, warts not withstanding. Otherwise the kind of warts brought on by processor design. IL has these type restrictions because that's what real 32-bit processors have too, particularly the RISC designs that were popular in the 90s. Their internal registers can only handle 32-bit integers and IEEE-754 floating point. And only permit smaller types in loads and stores. The Intel x86 core is very popular and actually permits basic operations on smaller types. But that's mostly a historical accident due to Intel keeping the design compatible through the 8-bit 8080 and 16-bit 8086 generations. It doesn't come for free, 16-bit operations costs an extra cpu cycle. To be avoided.
The spec calls this out as a specific case of the compound assignment operator:
http://msdn.microsoft.com/en-us/library/aa691316%28v=vs.71%29.aspx
Specifically, bullet 2:
Otherwise, if the selected operator is a predefined operator, if the
return type of the selected operator is explicitly convertible to the
type of x, and if y is implicitly convertible to the type of x, then
the operation is evaluated as x = (T)(x op y), where T is the type of
x, except that x is evaluated only once.
The second rule above permits x op= y to be evaluated as x = (T)(x op
y) in certain contexts. The rule exists such that the predefined
operators can be used as compound operators when the left operand is
of type sbyte, byte, short, ushort, or char. Even when both arguments
are of one of those types, the predefined operators produce a result
of type int, as described in Section 7.2.6.2. Thus, without a cast it
would not be possible to assign the result to the left operand.
This because of the implicit conversion rules.
When you have a binary + operator the result is converted to the larger of the two types.
The literal 200 is of type int and therefore the type of the expression b+200 is int.
The assignment operator = doesn't do implicit conversion, but rather throws error. As in
int x = 10;
Byte b = x; //Error
In the second case the += operator expects byte, so 200 (which is of type int, but fits into byte) is implicitly converted to byte, because the compiler knows it can. The following won't compile, because the compile doesn't know if x will fit in a byte or not.
Byte b = 100;
int x = 200;
b += x; //Error
If you make x a const then it compiles:
Byte b = 100;
const int x = 200;
b += x; //OK
For example, why long int has a literal modifier, but short int does not? I am referring to the following question on this site: C# compiler number literals
In general, C# seems to be a very well designed and consistent language. Probably there is a strong reason to provide literal modifiers for some types, but not for all. What is it?
Why long int has a literal modifier, but short int does not?
The question is "why does C# not have this feature?" The answer to that question is always the same. Features are unimplemented by default; C# does not have that feature because no one designed, implemented and shipped the feature to customers.
The absence of a feature does not need justification. Rather, all features must be justified by showing that their benefits outweigh their costs. As the person proposing the feature, the onus is on you to describe why you think the feature is valuable; the onus is not on me to explain why it is not.
Probably there is a strong reason to provide literal modifiers for some types, but not for all. What is it?
Now that is a more answerable question. Now the question is "what justifies the literal suffix on long, and why is that not also a justification for a similar literal suffix on short?"
Integers can be used for a variety of purposes. You can use them as arithmetic numbers. You can use them as collections of bit flags. You can use them as indices into arrays. And there are lots of more special-purpose usages. But I think it is fair to say that most of the time, integers are used as arithmetical numbers.
The vast majority of calculations performed in integers by normal programs involve numbers that are far, far smaller than the range of a 32 bit signed integer -- roughly +/- two billion. And lots of modern hardware is extremely efficient when dealing solely with 32 bit integers. It therefore makes sense to make the default representation of numbers to be signed 32 bit integers. C# is therefore designed to make calculations involving 32 bit signed integers look perfectly normal; when you say "x = x + 1" that "1" is understood to be a signed 32 bit integer, and odds are good that x is too, and the result of the sum is too.
What if the calculation is integral but does not fit into the range of a 32 bit integer? "long" 64 bit integers are a sensible next step up; they are also efficient on a lot of hardware and longs have a range that should satisfy the needs of pretty much anyone who isn't doing heavy-duty combinatorics that involve extremely large numbers. It therefore makes sense to have some way to specify clearly and concisely in source code that this literal here is to be treated as a long integer.
Interop scenarios, or scenarios in which integers are used as bit fields, often require the use of unsigned integers. Again, it makes sense to have a way to clearly and concisely specify that this literal is intended to be treated as an unsigned integer.
So, summing up, when you see "1", odds are good that the vast majority of the time the user intends it to be used as a 32 bit signed integer. The next most likely cases are that the user intends it to be a long integer or an unsigned int or unsigned long. Therefore there are concise suffixes for each of those cases.
Thus, the feature is justified.
Why is that not a justification for shorts?
Because first, in every context in which a short is legal, it is already legal to use an integer literal. "short x = 1;" is perfectly legal; the compiler realizes that the integer fits into a short and lets you use it.
Second, arithmetic is never done in shorts in C#. Arithmetic can be done in ints, uints, longs and ulongs, but arithmetic is never done in shorts. Shorts promote to int and the arithmetic is done in ints, because like I said before, the vast majority of arithmetic calculations fit into an int. The vast majority do not fit into a short. Short arithmetic is possibly slower on modern hardware which is optimized for ints, and short arithmetic does not take up any less space; it's going to be done in ints or longs on the chip.
You want a "long" suffix to tell the compiler "this arithmetic needs to be done in longs" but a "short" suffix doesn't tell the compiler "this arithmetic needs to be done in shorts" because that's simply not a feature of the C# language to begin with.
The reasons for providing a long suffix and an unsigned syntax do not apply to shorts. If you think there is a compelling benefit to the feature, state what the benefit is. Without a benefit to justify its costs, the feature will not be implemented in C#.
According to MSDN:
short x = 32767;
In the preceding declaration, the integer literal 32767 is implicitly
converted from int to short. If the integer literal does not fit into
a short storage location, a compilation error will occur.
So it is a compile time feature. short does not have a suffix because it would never be needed.
The related question probably is : Why do long, float and decimal do have suffixes?
And a short answer would be that i + 1 and i + 1L can produce different values and are therefore of different types.
But there exists no such thing as 'short arithmetic', short values are always converted to int when used in a calculation.
As Eric points out in the comment, my answer below doesn't make sense. I think it's more correct to say that the inability to express a short literal in C# and the inability to express a short literal in IL share a common cause (the lack of a compelling reason for the feature.) VB.Net apparently has a short literal specifier, which is interesting (for backwards compatibility with VB syntax?) In any case, I've left the answer here as some of the information may be interesting, even if the reasoning is incorrect.
There is no short literal because there is not actually any way for a short literal to be loaded in IL, the underlying language used by the CLR. This is because all 'short' types (anything smaller than an int) are implicitly widened to an int when loaded onto the operation stack. Signed and unsigned are likewise a matter of operations and not actually 'stored' with the active number on the operation stack. The 'short' types only come into play when you want to store a number on the operation stack into a memory location, so there are IL operations to Convert to various 'short' types (though it actually still widens the number back to an int after the conversion; it just makes sure that the value will be suitable for storing into a field of the 'short' type.)
Long types have a literal specifier, on the other hand, due to the fact that they are treated differently on the operation stack. There is a separate Ldc_I8 instruction for loading constant long values. There are also Ldc_R4 (hence why you need 'f' for float) and Ldc_R8 (C# chooses this as it's default if you use a decimal number without a specifier.) Decimal is a special case, as it's not actually a primitive type in IL; it just has a built-in constant specifier 'm' in C# that compiles to a constructor call.
As for why there are no special short operations (and corresponding short literals), that's likely because most current CPU architectures do not operate with registers smaller than 32-bits, so there is no distinction at the CPU level worth exploiting. You could potentially save code size (in terms of bytes of IL) by allowing for 'short' load IL opcodes, but at the cost of additional complexity for the jitter; the code space saved is probably not worth it.
Since a short can be implicitly converted to int, long, float, double, or decimal; there's no need for a literal modifier.
Consider:
void method(int a) {}
void method2()
{
short a = 4;
method(a); // no problems
}
You may notice that char and byte are also with literal modifiers, for possibly this same reason.
From To
sbyte short, int, long, float, double, or decimal
byte short, ushort, int, uint, long, ulong, float, double, or decimal
short int, long, float, double, or decimal
ushort int, uint, long, ulong, float, double, or decimal
int long, float, double, or decimal
uint long, ulong, float, double, or decimal
long float, double, or decimal
char ushort, int, uint, long, ulong, float, double, or decimal
float double
ulong float, double, or decimal
If you declare a literal short and make it larger than Short.MaxValue a compiler error will occur, otherwise the literal will be a short.
The time I have "worked in Short" was for values that are stored in a Database.
They are positive integer values that will rarely go over 10 to 20.(a byte or sbyte would be big enough, but I figured a little over kill would keep me from regretting my choice, if the code got reused in a slightly different way)
The field is used to let the user sort the records in a table. This table feeds a drop down or radio button list that is ordered by "time" (step one, step two, ...).
Being new to C# (and old enough to remember when counting bytes was important) I figured it would be a little more efficient. I don't do math on the values. I just Sort them (and swap them between records). The only math so far has been "MaxInUse"+1 (for new records), which is a special case "++MaxInUse". This is good, because the lack of a literal means "s = s+2" would have to be "s = (Int16)(s+2)".
Now that I see how annoying C# makes working with the other ints, I expect to join the modern world and waste bytes, JUST to make the compiler happy.
But, shouldn't "making the compiler happy" rank about #65 in our top 10 programming goals?
Is it EVER an advantage to have the compiler complain about adding the integer "2" to ANY of the INTEGER types? It should complain about "s=123456", but that's a different case.
If anyone does have to deal with math AND shorts, I suggest you make your own literals. (How many could you need?)
short s1= 1, s2 = 2, s123 = 123;
Then s = s + s2 is only a little annoying (and confusing for those who follow after you).
1) If one operand is of type ulong, while the other operand is of type sbyte/short/int/long, then compile-time error occurs. I fail to see the logic in this. Thus, why would it be bad idea for both operands to instead be promoted to type double or float?
long L = 100;
ulong UL = 1000;
double d = L + UL; // error saying + operator can't be applied
to operands of type ulong and long
b) Compiler implicitly converts int literal into byte type and assigns resulting value to b:
byte b = 1;
But if we try to assign a literal of type ulong to type long(or to types int, byte etc), then compiler reports an error:
long L = 1000UL;
I would think compiler would be able to figure out whether result of constant expression could fit into variable of type long?!
thank you
To answer the question marked (1) -- adding signed and unsigned longs is probably a mistake. If the intention of the developer is to overflow into inexact arithmetic in this scenario then that's something they should do explicitly, by casting both arguments to double. Doing so implicitly is hiding mistakes more often than it is doing the right thing.
To answer the question marked (b) -- of course the compiler could figure that out. Obviously it can because it does so for integer literals. But again, this is almost certainly an error. If your intention was to make that a signed long then why did you mark it as unsigned? This looks like a mistake. C# has been carefully designed so that it looks for weird patterns like this and calls your attention to them, rather than making a guess that you meant to say this weird thing and blazing on ahead as if everything were normal. The compiler is trying to encourage you to write sensible code; sensible code does not mix signed and unsigned types.
Why should it?
Generally, the 2 types are incompatible because long is signed. You are only describing a special case.
For byte b = 1; 1 has no implicit type as such and can be coerced into byte
For long L = 1000UL; "1000UL" does have an explicit type and is incompatible and see my general case above.
Example from "ulong" on MSDN:
When an integer literal has no suffix,
its type is the first of these types
in which its value can be represented:
int, uint, long, ulong.
and then
There is no implicit conversion from
ulong to any integral type
On "long" in MSDN (my bold)
When an integer literal has no suffix,
its type is the first of these types
in which its value can be represented:
int, uint, long, ulong.
It's quite common and logical and utterly predictable
long l = 100;
ulong ul = 1000;
double d = l + ul; // error
Why would it be bad idea for both operands to instead be promoted to type double or float?
Which one? Floats? Or doubles? Or maybe decimals? Or longs? There's no way for the compiler to know what you are thinking. Also type information generally flows out of expressions not into them, so it can't use the target of the assignment to choose either.
The fix is to simply specify which type you want by casting one or both of the arguments to that type.
The compiler doesn't consider what you do with the result when it determines the result type of an expression. The rules for how types are promoted in an expression only consider the values in the expression itself, not what you do with the value later on.
In the case where you assign the result to a variable, it could be possible to use that information, but consider a statement like this:
Console.Write(L + UL);
The Write method has overloads that take several different data types, which would make it rather complicated to decide how to use that information.
For example, there is an overload that takes a string, so one possible way to promote the types (and a good candidate as it doesn't lose any precision) would be to first convert both values to strings and then concatenate them, which is probably not the result that you were after.
Simple answer is that's just the way the language spec is written:
http://msdn.microsoft.com/en-us/library/y5b434w4(v=VS.80).aspx
You can argue over whether the rules of implicit conversions are logical in each case, but at the end of the day these are just the rules the design committee decided on.
Any implicit conversion has a downside in that it's doing something the programmer may not expect. The general principal with c# seems to be to error in these cases rather then try to guess what the programmer meant.
Suppose one variable was equal to 9223372036854775807 and the other was equal to -9223372036854775806? What should the result of the addition be? Converting the two values to double would round them to 9223372036854775808 and -9223372036854775808, respectively; performing the subtraction would then yield 0.0 (exactly). By contrast, if both values were signed, the result would be 1.0 (also exact). It would be possible to convert both operands to type Decimal and do the math exactly. Conversion to Double after the fact would require an explicit cast, however.
This question already has answers here:
byte + byte = int... why?
(16 answers)
Closed 5 years ago.
Why does the following raise a compile time error: 'Cannot implicitly convert type 'int' to 'byte':
byte a = 25;
byte b = 60;
byte c = a ^ b;
This would make sense if I were using an arithmentic operator because the result of a + b could be larger than can be stored in a single byte.
However applying this to the XOR operator is pointless. XOR here it a bitwise operation that can never overflow a byte.
using a cast around both operands works:
byte c = (byte)(a ^ b);
I can't give you the rationale, but I can tell why the compiler has that behavior from the stand point of the rules the compiler has to follow (which might not really be what you're interesting in knowing).
From an old copy of the C# spec (I should probably download a newer version), emphasis added:
14.2.6.2 Binary numeric promotions This clause is informative.
Binary numeric promotion occurs for
the operands of the predefined +, ?,
*, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary
numeric promotion implicitly converts
both operands to a common type which,
in case of the non-relational
operators, also becomes the result
type of the operation. Binary numeric
promotion consists of applying the
following rules, in the order they
appear here:
If either operand is of type decimal, the other operand is
converted to type decimal, or a
compile-time error occurs if the other
operand is of type float or double.
Otherwise, if either operand is of type double, the other operand is
converted to type double.
Otherwise, if either operand is of type float, the other operand is
converted to type float.
Otherwise, if either operand is of type ulong, the other operand is
converted to type ulong, or a
compile-time error occurs if the other
operand is of type sbyte, short, int,
or long.
Otherwise, if either operand is of type long, the other operand is
converted to type long.
Otherwise, if either operand is of type uint and the other operand is of
type sbyte, short, or int, both
operands are converted to type long.
Otherwise, if either operand is of type uint, the other operand is
converted to type uint.
Otherwise, both operands are converted to type int.
So, basically operands smaller than an int will be converted to int for these operators (and the result will be an int for the non-relational ops).
I said that I couldn't give you a rationale; however, I will make a guess at one - I think that the designers of C# wanted to make sure that operations that might lose information if narrowed would need to have that narrowing operation made explicit by the programmer in the form of a cast. For example:
byte a = 200;
byte b = 100;
byte c = a + b; // value would be truncated
While this kind of truncation wouldn't happen when performing an xor operation between two byte operands, I think that the language designers probably didn't want to have a more complex set of rules where some operations would need explicit casts and other not.
Just a small note: the above quote is 'informational' not 'normative', but it covers all the cases in an easy to read form. Strictly speaking (in a normative sense), the reason the ^ operator behaves this way is because the closest overload for that operator when dealing with byte operands is (from 14.10.1 "Integer logical operators"):
int operator ^(int x, int y);
Therefore, as the informative text explains, the operands are promoted to int and an int result is produced.
FWIW
byte a = 25;
byte b = 60;
a = a ^ b;
does not work. However
byte a = 25;
byte b = 60;
a ^= b;
does work.
The demigod programmer from Microsoft has an answer: Link
And maybe it's more about compiler design. They make the compiler simpler by generalizing the compiling process, it doesn't have to look at operator of operands, so it lumped bitwise operations in the same category as arithmetic operators. Thereby, subjected to type widening
Link dead, archive here:
https://web.archive.org/web/20140118171646/http://blogs.msdn.com/b/oldnewthing/archive/2004/03/10/87247.aspx
I guess its because the operator XOR is defined for booleans and integers.
And a cast of the result from the integer result to a byte is an information-losing conversion ; hence needs an explicit cast (nod from the programmer).
It seems to be because in C# language specifications, it is defined for integer and long
http://msdn.microsoft.com/en-us/library/aa691307%28v=VS.71%29.aspx
So, what actually happens is that compiler casts byte operands to int implicitly because there is no loss of data that way. But the result (which is int) can not be down-cast-ed without loss of data (implicitly). So, you need to tell the compiler explicitly that you know what you are doing!
As to why the two bytes have to be converted to ints to do the XOR?
If you want to dig into it, 12.1.2 of the CLI Spec (Partition I) describes the fact that, on the evaluation stack, only int or long can exist. All shorter integral types have to be expanded during evaluation.
Unfortunately, I can't find a suitable link directly to the CLI Spec - I've got a local copy as PDF, but can't remember where I got it from.
This has more to do with the rules surrounding implicit and explicit casting in the CLI specification. An integer (int = System.Int32 = 4 bytes) is wider than a byte (1 byte, obviously!). Therefore any cast from int to byte is potentially a narrowing cast. Therefore, the compiler wants you to make this explicit.