Related
First of all: I know how to work around this issue. I'm not searching for a solution. I am interested in the reasoning behind the design choices that led to some implicit conversions and didn't lead to others.
Today I came across a small but influential error in our code base, where an int constant was initialised with a char representation of that same number. This results in an ASCII conversion of the char to an int. Something like this:
char a = 'a';
int z = a;
Console.WriteLine(z);
// Result: 97
I was confused why C# would allow something like this. After searching around I found the following SO question with an answer by Eric Lippert himself: Implicit Type cast in C#
An excerpt:
However, we can make educated guesses as to why implicit
char-to-ushort was considered a good idea. The key idea here is that
the conversion from number to character is a "possibly dodgy"
conversion. It's taking something that you do not KNOW is intended to
be a character, and choosing to treat it as one. That seems like the
sort of thing you want to call out that you are doing explicitly,
rather than accidentally allowing it. But the reverse is much less
dodgy. There is a long tradition in C programming of treating
characters as integers -- to obtain their underlying values, or to do
mathematics on them.
I can agree with the reasoning behind it, though an IDE hint would be awesome. However, I have another situation where the implicit conversion suddenly is not legal:
char a = 'a';
string z = a; // CS0029 Cannot implicitly convert type 'char' to 'string'
This conversion is in my humble opinion, very logical. It cannot lead to data loss and the intention of the writer is also very clear. Also after I read the rest of the answer on the char to int implicit conversion, I still don't see any reason why this should not be legal.
So that leads me to my actual question:
What reasons could the C# design team have, to not implement the implicit conversion from char to a string, while it appears so obvious by the looks of it (especially when comparing it to the char to int conversion).
First off, as I always say when someone asks "why not?" question about C#: the design team doesn't have to provide a reason to not do a feature. Features cost time, effort and money, and every feature you do takes time, effort and money away from better features.
But I don't want to just reject the premise out of hand; the question might be better phrased as "what are design pros and cons of this proposed feature?"
It's an entirely reasonable feature, and there are languages which allow you to treat single characters as strings. (Tim mentioned VB in a comment, and Python also treats chars and one-character strings as interchangeable IIRC. I'm sure there are others.) However, were I pitched the feature, I'd point out a few downsides:
This is a new form of boxing conversion. Chars are cheap value types. Strings are heap-allocated reference types. Boxing conversions can cause performance problems and produce collection pressure, and so there's an argument to be made that they should be more visible in the program, not less visible.
The feature will not be perceived as "chars are convertible to one-character strings". It will be perceived by users as "chars are one-character strings", and now it is perfectly reasonable to ask lots of knock-on questions, like: can call .Length on a char? If I can pass a char to a method that expects a string, and I can pass a string to a method that expects an IEnumerable<char>, can I pass a char to a method that expects an IEnumerable<char>? That seems... odd. I can call Select and Where on a string; can I on a char? That seems even more odd. All the proposed feature does is move your question; had it been implemented, you'd now be asking "why can't I call Select on a char?" or some such thing.
Now combine the previous two points together. If I think of chars as one-character strings, and I convert a char to an object, do I get a boxed char or a string?
We can also generalize the second point a bit further. A string is a collection of chars. If we're going to say that a char is convertible to a collection of chars, why stop with strings? Why not also say that a char can also be used as a List<char>? Why stop with char? Should we say that an int is convertible to IEnumerable<int>?
We can generalize even further: if there's an obvious conversion from char to sequence-of-chars-in-a-string, then there is also an obvious conversion from char to Task<char> -- just create a completed task that returns the char -- and to Func<char> -- just create a lambda that returns the char -- and to Lazy<char>, and to Nullable<char> -- oh, wait, we do allow a conversion to Nullable<char>. :-)
All of these problems are solvable, and some languages have solved them. That's not the issue. The issue is: all of these problems are problems that the language design team must identify, discuss and resolve. One of the fundamental problems in language design is how general should this feature be? In two minutes I've gone from "chars are convertible to single-character strings" to "any value of an underlying type is convertible to an equivalent value of a monadic type". There is an argument to be made for both features, and for various other points on the spectrum of generality. If you make your language features too specific, it becomes a mass of special cases that interact poorly with each other. If you make them too general, well, I guess you have Haskell. :-)
Suppose the design team comes to a conclusion about the feature: all of that has to be written up in the design documents and the specification, and the code, and tests have to be written, and, oh, did I mention that any time you make a change to convertibility rules, someone's overload resolution code breaks? Convertibility rules you really have to get right in the first version, because changing them later makes existing code more fragile. There are real design costs, and there are real costs to real users if you make this sort of change in version 8 instead of version 1.
Now compare these downsides -- and I'm sure there are more that I haven't listed -- to the upsides. The upsides are pretty tiny: you avoid a single call to ToString or + "" or whatever you do to convert a char to a string explicitly.
That's not even close to a good enough benefit to justify the design, implementation, testing, and backwards-compat-breaking costs.
Like I said, it's a reasonable feature, and had it been in version 1 of the language -- which did not have generics, or an installed base of billions of lines of code -- then it would have been a much easier sell. But now, there are a lot of features that have bigger bang for smaller buck.
Char is a value type. No, it is not numerical like int or double, but it still derives from System.ValueType. Then a char variable lives on the stack.
String is a reference type. So it lives on the heap. If you want to use a value type as a reference type, you need to box it first. The compiler needs to be sure that you know what you're doing via boxing. So, you can't have an implicit cast.
Cause when you do a cast of a char to int, you aren't just changing its' type, you are getting its' index in the ASCII table which is a really different thing, they are not letting you to "convert a char to an int" they are giving you a useful data for a possible need.
Why not letting the cast of a char to string?
Cause they actually represent much different things, and are stored in a much different way.
A char and a string aren't 2 ways of expressing the same thing.
I think I understand strong typing, but every time I look for examples for what is weak typing I end up finding examples of programming languages that simply coerce/convert types automatically.
For instance, in this article named Typing: Strong vs. Weak, Static vs. Dynamic says that Python is strongly typed because you get an exception if you try to:
Python
1 + "1"
Traceback (most recent call last):
File "", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
However, such thing is possible in Java and in C#, and we do not consider them weakly typed just for that.
Java
int a = 10;
String b = "b";
String result = a + b;
System.out.println(result);
C#
int a = 10;
string b = "b";
string c = a + b;
Console.WriteLine(c);
In this another article named Weakly Type Languages the author says that Perl is weakly typed simply because I can concatenate a string to a number and viceversa without any explicit conversion.
Perl
$a=10;
$b="a";
$c=$a.$b;
print $c; #10a
So the same example makes Perl weakly typed, but not Java and C#?.
Gee, this is confusing
The authors seem to imply that a language that prevents the application of certain operations on values of different types is strongly typed and the contrary means weakly typed.
Therefore, at some point I have felt prompted to believe that if a language provides a lot of automatic conversions or coercion between types (as perl) may end up being considered weakly typed, whereas other languages that provide only a few conversions may end up being considered strongly typed.
I am inclined to believe, though, that I must be wrong in this interepretation, I just do not know why or how to explain it.
So, my questions are:
What does it really mean for a language to be truly weakly typed?
Could you mention any good examples of weakly typing that are not related to automatic conversion/automatic coercion done by the language?
Can a language be weakly typed and strongly typed at the same time?
UPDATE: This question was the subject of my blog on the 15th of October, 2012. Thanks for the great question!
What does it really mean for a language to be "weakly typed"?
It means "this language uses a type system that I find distasteful". A "strongly typed" language by contrast is a language with a type system that I find pleasant.
The terms are essentially meaningless and you should avoid them. Wikipedia lists eleven different meanings for "strongly typed", several of which are contradictory. This indicates that the odds of confusion being created are high in any conversation involving the term "strongly typed" or "weakly typed".
All that you can really say with any certainty is that a "strongly typed" language under discussion has some additional restriction in the type system, either at runtime or compile time, that a "weakly typed" language under discussion lacks. What that restriction might be cannot be determined without further context.
Instead of using "strongly typed" and "weakly typed", you should describe in detail what kind of type safety you mean. For example, C# is a statically typed language and a type safe language and a memory safe language, for the most part. C# allows all three of those forms of "strong" typing to be violated. The cast operator violates static typing; it says to the compiler "I know more about the runtime type of this expression than you do". If the developer is wrong, then the runtime will throw an exception in order to protect type safety. If the developer wishes to break type safety or memory safety, they can do so by turning off the type safety system by making an "unsafe" block. In an unsafe block you can use pointer magic to treat an int as a float (violating type safety) or to write to memory you do not own. (Violating memory safety.)
C# imposes type restrictions that are checked at both compile-time and at runtime, thereby making it a "strongly typed" language compared to languages that do less compile-time checking or less runtime checking. C# also allows you to in special circumstances do an end-run around those restrictions, making it a "weakly typed" language compared with languages which do not allow you to do such an end-run.
Which is it really? It is impossible to say; it depends on the point of view of the speaker and their attitude towards the various language features.
As others have noted, the terms "strongly typed" and "weakly typed" have so many different meanings that there's no single answer to your question. However, since you specifically mentioned Perl in your question, let me try to explain in what sense Perl is weakly typed.
The point is that, in Perl, there is no such thing as an "integer variable", a "float variable", a "string variable" or a "boolean variable". In fact, as far as the user can (usually) tell, there aren't even integer, float, string or boolean values: all you have are "scalars", which are all of these things at the same time. So you can, for example, write:
$foo = "123" + "456"; # $foo = 579
$bar = substr($foo, 2, 1); # $bar = 9
$bar .= " lives"; # $bar = "9 lives"
$foo -= $bar; # $foo = 579 - 9 = 570
Of course, as you correctly note, all of this can be seen as just type coercion. But the point is that, in Perl, types are always coerced. In fact, it's quite hard for a user to tell what the internal "type" of a variable might be: at line 2 in my example above, asking whether the value of $bar is the string "9" or the number 9 is pretty much meaningless, since, as far as Perl is concerned, those are the same thing. Indeed, it's even possible for a Perl scalar to internally have both a string and a numeric value at the same time, as is e.g. the case for $foo after line 2 above.
The flip side of all this is that, since Perl variables are untyped (or, rather, don't expose their internal type to the user), operators cannot be overloaded to do different things for different types of arguments; you can't just say "this operator will do X for numbers and Y for strings", because the operator can't (won't) tell which kind of values its arguments are.
Thus, for example, Perl has and needs both a numeric addition operator (+) and a string concatenation operator (.): as you saw above, it's perfectly fine to add strings ("1" + "2" == "3") or to concatenate numbers (1 . 2 == 12). Similarly, the numeric comparison operators ==, !=, <, >, <=, >= and <=> compare the numeric values of their arguments, while the string comparison operators eq, ne, lt, gt, le, ge and cmp compare them lexicographically as strings. So 2 < 10, but 2 gt 10 (but "02" lt 10, while "02" == 2). (Mind you, certain other languages, like JavaScript, try to accommodate Perl-like weak typing while also doing operator overloading. This often leads to ugliness, like the loss of associativity for +.)
(The fly in the ointment here is that, for historical reasons, Perl 5 does have a few corner cases, like the bitwise logical operators, whose behavior depends on the internal representation of their arguments. Those are generally considered an annoying design flaw, since the internal representation can change for surprising reasons, and so predicting just what those operators do in a given situation can be tricky.)
All that said, one could argue that Perl does have strong types; they're just not the kind of types you might expect. Specifically, in addition to the "scalar" type discussed above, Perl also has two structured types: "array" and "hash". Those are very distinct from scalars, to the point where Perl variables have different sigils indicating their type ($ for scalars, # for arrays, % for hashes)1. There are coercion rules between these types, so you can write e.g. %foo = #bar, but many of them are quite lossy: for example, $foo = #bar assigns the length of the array #bar to $foo, not its contents. (Also, there are a few other strange types, like typeglobs and I/O handles, that you don't often see exposed.)
Also, a slight chink in this nice design is the existence of reference types, which are a special kind of scalars (and which can be distinguished from normal scalars, using the ref operator). It's possible to use references as normal scalars, but their string/numeric values are not particularly useful, and they tend to lose their special reference-ness if you modify them using normal scalar operations. Also, any Perl variable2 can be blessed to a class, turning it into an object of that class; the OO class system in Perl is somewhat orthogonal to the primitive type (or typelessness) system described above, although it's also "weak" in the sense of following the duck typing paradigm. The general opinion is that, if you find yourself checking the class of an object in Perl, you're doing something wrong.
1 Actually, the sigil denotes the type of the value being accessed, so that e.g. the first scalar in the array #foo is denoted $foo[0]. See perlfaq4 for more details.
2 Objects in Perl are (normally) accessed through references to them, but what actually gets blessed is the (possibly anonymous) variable the reference points to. However, the blessing is indeed a property of the variable, not of its value, so e.g. that assigning the actual blessed variable to another one just gives you a shallow, unblessed copy of it. See perlobj for more details.
In addition to what Eric has said, consider the following C code:
void f(void* x);
f(42);
f("hello");
In contrast to languages such as Python, C#, Java or whatnot, the above is weakly typed because we lose type information. Eric correctly pointed out that in C# we can circumvent the compiler by casting, effectively telling it “I know more about the type of this variable than you”.
But even then, the runtime will still check the type! If the cast is invalid, the runtime system will catch it and throw an exception.
With type erasure, this doesn’t happen – type information is thrown away. A cast to void* in C does exactly that. In this regard, the above is fundamentally different from a C# method declaration such as void f(Object x).
(Technically, C# also allows type erasure through unsafe code or marshalling.)
This is as weakly typed as it gets. Everything else is just a matter of static vs. dynamic type checking, i.e. of the time when a type is checked.
A perfect example comes from the wikipedia article of Strong Typing:
Generally strong typing implies that the programming language places severe restrictions on the intermixing that is permitted to occur.
Weak Typing
a = 2
b = "2"
concatenate(a, b) # returns "22"
add(a, b) # returns 4
Strong Typing
a = 2
b = "2"
concatenate(a, b) # Type Error
add(a, b) # Type Error
concatenate(str(a), b) #Returns "22"
add(a, int(b)) # Returns 4
Notice that a weak typing language can intermix different types without errors. A strong type language requires the input types to be the expected types. In a strong type language a type can be converted (str(a) converts an integer to a string) or cast (int(b)).
This all depends on the interpretation of typing.
I would like to contribute to the discussion with my own research on the subject, as others comment and contribute I have been reading their answers and following their references and I have found interesting information. As suggested, it is probable that most of this would be better discussed in the Programmers forum, since it appears to be more theoretical than practical.
From a theoretical standpoint, I think the article by Luca Cardelli and Peter Wegner named On Understanding Types, Data Abstraction and Polymorphism has one of the best arguments I have read.
A type may be viewed as a set of clothes (or a suit of armor) that
protects an underlying untyped representation from arbitrary or
unintended use. It provides a protective covering that hides the
underlying representation and constrains the way objects may interact
with other objects. In an untyped system untyped objects are naked
in that the underlying representation is exposed for all to see.
Violating the type system involves removing the protective set of
clothing and operating directly on the naked representation.
This statement seems to suggest that weakly typing would let us access the inner structure of a type and manipulate it as if it was something else (another type). Perhaps what we could do with unsafe code (mentioned by Eric) or with c type-erased pointers mentioned by Konrad.
The article continues...
Languages in which all expressions are type-consistent are called
strongly typed languages. If a language is strongly typed its compiler
can guarantee that the programs it accepts will execute without type
errors. In general, we should strive for strong typing, and adopt
static typing whenever possible. Note that every statically typed
language is strongly typed but the converse is not necessarily true.
As such, strong typing means the absence of type errors, I can only assume that weak typing means the contrary: the likely presence of type errors. At runtime or compile time? Seems irrelevant here.
Funny thing, as per this definition, a language with powerful type coercions like Perl would be considered strongly typed, because the system is not failing, but it is dealing with the types by coercing them into appropriate and well defined equivalences.
On the other hand, could I say than the allowance of ClassCastException and ArrayStoreException (in Java) and InvalidCastException, ArrayTypeMismatchException (in C#) would indicate a level of weakly typing, at least at compile time? Eric's answer seems to agree with this.
In a second article named Typeful Programming provided in one of the references provided in one of the answers in this question, Luca Cardelli delves into the concept of type violations:
Most system programming languages allow arbitrary type violations,
some indiscriminately, some only in restricted parts of a program.
Operations that involve type violations are called unsound. Type
violations fall in several classes [among which we can mention]:
Basic-value coercions: These include conversions between integers, booleans, characters, sets, etc. There is no need for type violations
here, because built-in interfaces can be provided to carry out the
coercions in a type-sound way.
As such, type coercions like those provided by operators could be considered type violations, but unless they break the consistency of the type system, we might say that they do not lead to a weakly typed system.
Based on this neither Python, Perl, Java or C# are weakly typed.
Cardelli mentions two type vilations that I very well consider cases of truly weak typing:
Address arithmetic. If necessary, there should be a built-in (unsound) interface, providing the adequate operations on addresses
and type conversions. Various situations involve pointers into the
heap (very dangerous with relocating collectors), pointers to the
stack, pointers to static areas, and pointers into other address
spaces. Sometimes array indexing can replace address arithmetic.
Memory mapping. This involves looking at an area of memory as an unstructured array, although it contains structured data. This is
typical of memory allocators and collectors.
This kind of things possible in languages like C (mentioned by Konrad) or through unsafe code in .Net (mentioned by Eric) would truly imply weakly typing.
I believe the best answer so far is Eric's, because the definition of this concepts is very theoretical, and when it comes to a particular language, the interpretations of all these concepts may lead to different debatable conclusions.
Weak typing does indeed mean that a high percentage of types can be implicitly coerced, attempting to guess what the coder intended.
Strong typing means that types are not coerced, or at least coerced less.
Static typing means your variables' types are determined at compile time.
Many people have recently been confusing "manifestly typed" with "strongly typed". "Manifestly typed" means that you declare your variables' types explicitly.
Python is mostly strongly typed, though you can use almost anything in a boolean context, and booleans can be used in an integer context, and you can use an integer in a float context. It is not manifestly typed, because you don't need to declare your types (except for Cython, which isn't entirely python, albeit interesting). It is also not statically typed.
C and C++ are manifestly typed, statically typed, and somewhat strongly typed, because you declare your types, types are determined at compile time, and you can mix integers and pointers, or integers and doubles, or even cast a pointer to one type into a pointer to another type.
Haskell is an interesting example, because it is not manifestly typed, but it's also statically and strongly typed.
The strong <=> weak typing is not only about the continuum on how much or how little of the values are coerced automatically by the language for one datatype to another, but how strongly or weakly the actual values are typed. In Python and Java, and mostly in C#, the values have their types set in stone. In Perl, not so much - there are really only a handful of different valuetypes to store in a variable.
Let's open the cases one by one.
Python
In Python example 1 + "1", + operator calls the __add__ for type int giving it the string "1" as an argument - however, this results in NotImplemented:
>>> (1).__add__('1')
NotImplemented
Next, the interpreter tries the __radd__ of str:
>>> '1'.__radd__(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__radd__'
As it fails, the + operator fails with the the result TypeError: unsupported operand type(s) for +: 'int' and 'str'. As such, the exception does not say much about strong typing, but the fact that the operator + does not coerce its arguments automatically to the same type, is a pointer to the fact that Python is not the most weakly typed language in the continuum.
On the other hand, in Python 'a' * 5 is implemented:
>>> 'a' * 5
'aaaaa'
That is,
>>> 'a'.__mul__(5)
'aaaaa'
The fact that the operation is different requires some strong typing - however the opposite of * coercing the values to numbers before multiplying still would not necessarily make the values weakly typed.
Java
The Java example, String result = "1" + 1; works only because as a fact of convenience, the operator + is overloaded for strings. The Java + operator replaces the sequence with creating a StringBuilder (see this):
String result = a + b;
// becomes something like
String result = new StringBuilder().append(a).append(b).toString()
This is rather an example of very static typing, without no actual coercion - StringBuilder has a method append(Object) that is specifically used here. The documentation says the following:
Appends the string representation of the Object argument.
The overall effect is exactly as if the argument were converted to a
string by the method String.valueOf(Object), and the characters of
that string were then appended to this character sequence.
Where String.valueOf then
Returns the string representation of the Object argument.
[Returns] if the argument is null, then a string equal to "null"; otherwise, the value of obj.toString() is returned.
Thus this is a case of absolutely no coercion by the language - delegating every concern to the objects itself.
C#
According to the Jon Skeet answer here, operator + is not even overloaded for the string class - akin to Java, this is just convenience generated by the compiler, thanks to both static and strong typing.
Perl
As the perldata explains,
Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". A scalar is a single string (of any size, limited only by the available memory), number, or a reference to something (which will be discussed in perlref). Normal arrays are ordered lists of scalars indexed by number, starting with 0. Hashes are unordered collections of scalar values indexed by their associated string key.
Perl however does not have a separate data type for numbers, booleans, strings, nulls, undefineds, references to other objects etc - it just has one type for these all, the scalar type; 0 is a scalar value as much as is "0". A scalar variable that was set as a string can really change into a number, and from there on behave differently from "just a string", if it is accessed in a numerical context. The scalar can hold anything in Perl, it is as much the object as it exists in the system. whereas in Python the names just refers to the objects, in Perl the scalar values in the names are changeable objects. Furthermore, the Object Oriented Type system is glued on top of this: there are just 3 datatypes in perl - scalars, lists and hashes. A user defined object in Perl is a reference (that is a pointer to any of the 3 previous) blessed to a package - you can take any such value and bless it to any class at any instant you want.
Perl even allows you to change the classes of values at whim - this is not possible in Python where to create a value of some class you need to explicitly construct the value belonging to that class with object.__new__ or similar. In Python you cannot really change the essence of the object after the creation, in Perl you can do much anything:
package Foo;
package Bar;
my $val = 42;
# $val is now a scalar value set from double
bless \$val, Foo;
# all references to $val now belong to class Foo
my $obj = \$val;
# now $obj refers to the SV stored in $val
# thus this prints: Foo=SCALAR(0x1c7d8c8)
print \$val, "\n";
# all references to $val now belong to class Bar
bless \$val, Bar;
# thus this prints Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# we change the value stored in $val from number to a string
$val = 'abc';
# yet still the SV is blessed: Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# and on the course, the $obj now refers to a "Bar" even though
# at the time of copying it did refer to a "Foo".
print $obj, "\n";
thus the type identity is weakly bound to the variable, and it can be changed through any reference on the fly. In fact, if you do
my $another = $val;
\$another does not have the class identity, even though \$val will still give the blessed reference.
TL;DR
There are much more about weak typing to Perl than just automatic coercions, and it is more about that the types of the values themselves are not set into stone, unlike the Python which is dynamically yet very strongly typed language. That python gives TypeError on 1 + "1" is an indication that the language is strongly typed, even though the contrary one of doing something useful, as in Java or C# does not preclude them being strongly typed languages.
As many others have expressed, the entire notion of "strong" vs "weak" typing is problematic.
As a archetype, Smalltalk is very strongly typed -- it will always raise an exception if an operation between two objects is incompatible. However, I suspect few on this list would call Smalltalk a strongly-typed language, because it is dynamically typed.
I find the notion of "static" versus "dynamic" typing more useful than "strong" versus "weak." A statically-typed language has all the types figured out at compile-time, and the programmer has to explicitly declare if otherwise.
Contrast with a dynamically-typed language, where typing is performed at run-time. This is typically a requirement for polymorphic languages, so that decisions about whether an operation between two objects is legal does not have to be decided by the programmer in advance.
In polymorphic, dynamically-typed languages (like Smalltalk and Ruby), it's more useful to think of a "type" as a "conformance to protocol." If an object obeys a protocol the same way another object does -- even if the two objects do not share any inheritance or mixins or other voodoo -- they are considered the same "type" by the run-time system. More correctly, an object in such systems is autonomous, and can decide if it makes sense to respond to any particular message referring to any particular argument.
Want an object that can make some meaningful response to the message "+" with an object argument that describes the colour blue? You can do that in dynamically-typed languages, but it is a pain in statically-typed languages.
I like #Eric Lippert's answer, but to address the question - strongly typed languages typically have explicit knowledge of the types of variables at each point of the program. Weakly typed languages do not, so they can attempt to perform an operation that may not be possible for a particular type.
It think the easiest way to see this is in a function.
C++:
void func(string a) {...}
The variable a is known to be of type string and any incompatible operation will be caught at compile time.
Python:
def func(a)
...
The variable a could be anything and we can have code that calls an invalid method, which will only get caught at runtime.
Section 6.1 Implicit conversions defines an identity conversion thusly:
An identity conversion converts from any type to the same type. This conversion exists such that an entity that already has a required type can be said to be convertible to that type.
Now, what is the purpose of sentences such as these?
(In §6.1.6 Implicit reference conversions)
The implicit reference conversions are:
[...]
From any reference-type to a reference-type T if it has an implicit identity or reference conversion to a reference-type T0 and T0 has an identity conversion to T.
and:
(In §6.1.7 Boxing conversions)
A value type has a boxing conversion to an interface type I if it has a boxing conversion to an interface type I0 and I0 has an identity conversion to I.
Initially they seem redundant (tautologous). But they must be there for a purpose, so why are they there?
Can you give an example of two types T1, T2 such that T1 would not be implicitly convertible to T2 if it weren’t for the above-quoted paragraphs?
Update on 22-Sep-2010:
I doubt anybody is going to read this besides Timwi. Even so, I wanted to make a few edits to this answer in light of the fact that a new answer has now been accepted and the debate still continues (at least in my perhaps imaginary world) on whether or not the quoted excerpts of the spec are technically redundant. I am not adding much, but it's too substantial to fit in a comment. The bulk of the update can be found under the heading "Conversion involving the dynamic type" below.
Update on 19-Sep-2010:
In your comment:
[T]his doesn’t make sense.
Damn, Timwi, you say that a lot. But all right, then; you've put me on the defensive, so here goes!
Disclaimer: I have definitely not examined the spec as closely as you have. Based on some of your recent questions it seems like you've been looking into it quite a bit lately. This is naturally going to make you more familiar with a lot of details than most users on SO; so this, like most answers you're likely to receive from anyone other than Eric Lippert, may not satisfy you.
Different premises
Firstly, the premise of your question is that if the statements highlighted are redundant, then they serve no purpose. My answer's premise is that redundant statements are not necessarily without purpose if they clarify something that isn't obvious to everyone. These are contradictory premises. And if we can't agree on premises, we can't have a straightforward logical argument. I was simply asking you to rethink your premise.
Your response, however, was to reiterate your premise: "If the sentences are truly redundant, then they only confuse the reader and don't clarify anything."
(I like how you set yourself up as the representative for all readers of the spec there, by the way.)
I can't blame you for holding this position, exactly. I mean, it does seem obvious. And I didn't give any concrete examples in my original answer. So below I will try to include some concrete examples. But first, let me take a step back and offer my take on why this weird identity conversion concept exists in the spec in the first place.
The purpose of the identity conversion definition
Upon first glance, this definition seems rather superfluous; isn't it just saying that an instance of any type T is convertible to ... well, to T? Yes, it is. But I hypothesize* that the purpose of this definition is to provide the spec with the proper vocabulary to utilize the concept of type identity in the context of discussing conversions.
This allows for statements about conversions which are essentially transitive in nature. The first point you quoted from the spec as an example of a tautological statement falls into this category. It says that if an implicit conversion is defined for some type (I'll call it K) to another type T0 and T0 has an identity conversion to T, then K is implicitly convertible to T. By the definition of identity conversion given above, "has an identity conversion to" really means "is the same type as." So the statement is redundant.
But again: the identity conversion definition exists in the first place to equip the spec with a formal language for describing conversions without having to say things like "if T0 and T are really the same type."
OK, time for concrete examples.
Where the existence of an implicit conversion might not be obvious to some developers
Note: A much better example has been provided by Eric Lippert in his answer to the question. I leave these first two examples as only minor reinforcements of my point. I have also added a third example that concretizes the identity conversion that exists between object and dynamic as pointed out in Eric's answer.
Transitive reference conversion
Let's say you have two types, M and N, and you've got an implicit conversion defined like this:
public static implicit operator M(N n);
Then you can write code like this:
N n = new N();
M m = n;
Now let's say you've got a file with this using statement up top:
using K = M;
And then you have, later in the file:
N n = new N();
K k = n;
OK, before I proceed, I realize that this is obvious to you and me. But my answer is, and has been from the beginning, that it might not be obvious to everyone, and therefore specifying it--while redundant--still has a purpose.
That purpose is: to make clear to anyone scratching his or her head, looking at that code, it is legal. An implicit conversion exists from N to M, and an identity conversion exists from M to K (i.e., M and K are the same type); so an implicit conversion exists from N to K. It isn't just logical (though it may be logical); it's right there in the spec. Otherwise one might mistakenly believe that something like the following would be necessary:
K k = (M)n;
Clearly, it isn't.
Transitive boxing conversion
Or take the type int. An int can be boxed as an IComparable<int>, right? So this is legal:
int i = 10;
IComparable<int> x = i;
Now consider this:
int i = 10;
IComparable<System.Int32> x = i;
Again, yes, it may be obvious to you, me, and 90% of all developers who might ever come across it. But for that slim minority who don't see it right away: a boxing conversion exists from int to IComparable<int>, and an identity conversion exists from IComparable<int> to IComparable<System.Int32> (i.e., IComparable<int> and IComparable<System.Int32> are the same type); so a boxing conversion exists from int to IComparable<System.Int32>.
Conversion involving the dynamic type
I'm going to borrow from my reference conversion example above and just tweak it slightly to illustrate the identity relation between object and dynamic in version 4.0 of the spec.
Let's say we have the types M<T> and N, and have defined somewhere the following implicit conversion:
public static implicit operator M<object>(N n);
Then the following is legal:
N n = new N();
M<dynamic> m = n;
Clearly, the above is far less obvious than the two previous examples. But here's the million-dollar question: would the above still be legal even if the excerpts from the spec quoted in the question did not exist? (I'm going to call these excerpts Q for brevity.) If the answer is yes, then Q is in fact redundant. If no, then it is not.
I believe the answer is yes.
Consider the definition of identity conversion, defined in section 6.1.1 (I am including the entire section here as it is quite short):
An identity conversion converts from any type to the same type. This conversion exists such that an entity that already has a required type can be said to be convertible to that type.
Because object and dynamic are considered equivalent there is an identity conversion between object and dynamic, and between constructed types that are the same when replacing all occurences of dynamic with object. [emphasis mine]
(This last part is also included in section 4.7, which defines the dynamic type.)
Now let's look at the code again. In particular I'm interested in this one line:
M<dynamic> m = n;
The legality of this statement (disregarding Q -- remember, the issue being discussed is the hypothetical legality of the above statement if Q did not exist), since M<T> and N are custom types, depends on the existence of a user-defined implicit conversion between N and M<dynamic>.
There exists an implicit conversion from N to M<object>. By the section of the spec quoted above, there is an identity conversion between M<object> and M<dynamic>. By the definition of identity conversion, M<object> and M<dynamic> are the same type.
So, just as in the first two (more obvious) examples, I believe it is true that an implicit conversion exists from N to M<dynamic> even without taking Q into account, just as it is true that an implicit conversion exists from N to K in the first example and that a boxing conversion exists from int to IComparable<System.Int32> in the second example.
Without Q, this is much less obvious (hence Q's existence); but that does not make it false (i.e., Q is not necessary for this behavior to be defined). It just makes it less obvious.
Conclusion
I said in my original answer that this is the "obvious" explanation, because it seemed to me you were barking up the wrong tree. You initially posed this challenge:
Can you give an example of two types T1, T2 such that T1 would not be implicitly convertible to T2 if it weren’t for the above-quoted paragraphs?
No one's going to meet this challenge, Timwi, because it's impossible. Take the first excerpt about reference conversions. It is saying that a type K is implicitly convertible to a type T if it is implicitly convertible to T0 and T0 is the same as T. Deconstruct this, put it back together, and you're left with an obvious tautology: K is implicitly convertible to T if it's implicitly convertible to T. Does this introduce any new implicit conversions? Of course not.
So maybe Ben Voigt's comment was correct; maybe these points that you're asking about would've been better placed in footnotes, rather than in the body of the text. In any case, it's clear to me that they are redundant, and so to start with the premise they cannot be redundant, or else they wouldn't be there is to embark on a fool's errand. Be willing to accept that a redundant statement may still shed some light on a concept that may not be obvious to everyone, and it will become easier to accept these statements for what they are.
Redundant? Yes. Tautologous? Yes. Pointless? In my opinion, no.
*Obviously, I did not have any part in writing the C# language specification. If I did, this answer would be a lot more authoritative. As it is, it simply represents one guy's feeble attempt to make sense of a rather complex document.
Original answer
I think you're (perhaps intentionally) overlooking the most obvious answer here.
Consider these two sentences in your question:
(1) Initially they seem redundant
(tautologous). (2) But they must be there
for a purpose, so why are they there?
To me, the implication of these two sentences together is that a tautologous statement serves no purpose. But just because a statement follows logically from established premises, that does not make it obvious to everyone. In other words even if (1) is true, the answer to (2) may simply be: to make the described behavior clear to anyone reading the spec.
Now you might argue that even if something is not obvious, it still does not belong in a specification if it is providing a redundant definition. To this potential objection, I can only say: be realistic. It's not really practical (in my opinion) to comb through a document stripping out all statements which are simply stating facts that could have been deduced from prior statements.
If this were a common practice, I think you'd find a lot of literature out there -- not just specs, but research papers, articles, textbooks, etc. -- would be a lot shorter, denser, and more difficult to understand.
So: yes, perhaps they are redundant. But that does not negate their purpose.
Section 4.7 of the specification notes that there is an identity conversion from Foo<dynamic> to Foo<object> and vice versa. The portion of the spec you quoted is written to ensure that this case is handled. That is, if there is an implicit reference conversion from T to C<object, object> then there is also an implicit reference conversion to C<object, dynamic>, C<dynamic, object> and C<dynamic, dynamic>.
One might reasonably point out that (1) the intention of these phrases is unobvious - hence your question - and confusing, and (2) that the section on identity conversions ought to cross-reference the section on dynamic conversions, and (3) phrases like this in the spec make it difficult for an implementor of the specification to clearly translate the spec language into an implementation. How is one to know if any such type exists? The spec need not specify exact algorithms, but it would be nice if it gave more guidance.
The spec is, sadly, not a perfect document.
An identity conversion converts from
any type to the same type. This
conversion exists such that an entity
that already has a required type can
be said to be convertible to that
type.
This says that in C#-land, 1==1; a Spade is a Spade. This is the basis of assigning an object reference to a variable of the same type; if a variable typed T1 and one typed T2 are in reality both Spades, it is possible to assign one to the other without having to explicitly cast one as a Spade. Imagine a C# variant where assignments had to look like this:
Spade mySpade = new Spade();
Spade mySpade2;
mySpade2 = (Spade)mySpade; //explicit cast required
Also, an "identity" in mathematics states that an expression that evaluates to a result given a set of inputs is equivalent to another expression that produces the same result given the same inputs. In programming, this means that an expression or function that evaluates to an instance of a type is equivalent to that type, without explicit conversion. If that didn't hold, the following code would be required:
public int myMethod() { /*Do something*/ }
...
int myInt = (int)myMethod(); //required even though myMethod() evals to an int.
...
int myInt = (int)(1 + 2); //required even though 1, 2, and 1+2 eval to an int.
The second rule basically says that a value type can be assigned to a member variable on a class if, in part, the member variable (a boxed type by definition, as its container is a reference type) is declared to be the same type. If this rule were not specified, theoretically, a version of C# could exist in which pure value types would have to be explicitly converted to their reference analog in order to be stored as a member variable on a class. Imagine, for example, a version of C# in which the blue keyword types (int, float, decimal) and the light blue class names (Int32, Float, Decimal) referred to two very different, only-explicitly-convertible types, and int, float, decimal etc. were not legal as member variable types because they were not reference types:
public class MyClass
{
Int32 MyBoxedValueType; //using "int" not legal
}
...
MyClass myClass = new MyClass();
int theInt = 2;
myClass.MyBoxedValueType = (Int32)theInt; //explicit cast required
I know it sounds silly, but at some level, these things must be known, and in computers, you have to specify EVERYTHING. Read the USA Hockey rulebook for ice hockey sometime; the very first rule in the book is that the game shall be played on an ice surface. It's one of the ultimate "well duhs", but also a fundamental truth of the game that must be understood in order for any other rule to make sense.
May it is such that the code guarantees pass-through when called like Convert.ChangeType(client, typeof(Client)) regardless if IConvertible is implemented.
Look into the source of ChangeType from mscorlib with Reflector and notice the conditions at which value is returned as-is.
Remember a = operator is not a conversion, just a reference set. So code like Client client_2 = client_1 does not perform any implicit conversions. If an identity implicit conversion is declared then error CS0555 is issued.
I guess the spec says let the C# compiler handle such cases, and thus dot not manually try to define identity conversions.
I am writing algorithms that work on series of numeric data, where sometimes, a value in the series needs to be null. However, because this application is performance critical, I have avoided the use of nullable types. I have perf tested the algorithms to specifically compare the performance of using nullable types vs non-nullable types, and in the best case scenario nullable types are 2x slower, but often far worse.
The data type most often used is double, and currently the chosen alternative to null is double.NaN. However I understand this is not the exact intended usage for the NaN value, so am unsure whether there are any issues with this I cannot foresee and what the best practise would be.
I am interested in finding out what the best null alternatives are for the following data types in particular: double/float, decimal, DateTime, int/long (although others are more than welcome)
Edit: I think I need to clarify my requirements about performance. Gigs of numerical data are processed through these algorithms at a time which takes several hours. Therefore, although the difference between eg 10ms or 20ms is usually insignificant, in this scenario it really does makes a significant impact to the time taken.
Well, if you've ruled out Nullable<T>, you are left with domain values - i.e. a magic number that you treat as null. While this isn't ideal, it isn't uncommon either - for example, a lot of the main framework code treats DateTime.MinValue the same as null. This at least moves the damage far away from common values...
edit to highlight only where no NaN
So where there is no NaN, maybe use .MinValue - but just remember what evils happen if you accidentally use that same value meaning the same number...
Obviously for unsigned data you'll need .MaxValue (avoid zero!!!).
Personally, I'd try to use Nullable<T> as expressing my intent more safely... there may be ways to optimise your Nullable<T> code, perhaps. And also - by the time you've checked for the magic number in all the places you need to, perhaps it won't be much faster than Nullable<T>?
I somewhat disagree with Gravell on this specific edge case: a Null-ed variable is considered 'not defined', it doesn't have a value. So whatever is used to signal that is OK: even magic numbers, but with magic numbers you have to take into account that a magic number will always haunt you in the future when it becomes a 'valid' value all of a sudden. With Double.NaN you don't have to be afraid for that: it's never going to become a valid double. Though, you have to consider that NaN in the sense of the sequence of doubles can only be used as a marker for 'not defined', you can't use it as an error code in the sequences as well, obviously.
So whatever is used to mark 'undefined': it has to be clear in the context of the set of values that that specific value is considered the value for 'undefined' AND that won't change in the future.
If Nullable give you too much trouble, use NaN, or whatever else, as long as you consider the consequences: the value chosen represents 'undefined' and that will stay.
I am working on a large project that uses NaN as a null value. I am not entirely comfortable with it - for similar reasons as yours: not knowing what can go wrong. We haven't encountered any real problems so far, but be aware of the following:
NaN arithmetics - While, most of the time, "NaN promotion" is a good thing, it might not always be what you expect.
Comparison - Comparison of values gets rather expensive, if you want NaN's to compare equal. Now, testing floats for equality isn't simple anyway, but ordering (a < b) can get really ugly, because nan's sometimes need to be smaller, sometimes larger than normal values.
Code Infection - I see lots of arithmetic code that requires specific handling of NaN's to be correct. So you end up with "functions that accept NaN's" and "functions that don't" for performance reasons.
Other non-finites NaN is nto the only non-finite value. Should be kept in mind...
Floating Point Exceptions are not a problem when disabled. Until someone enables them. True story: Static intialization of a NaN in an ActiveX control. Doesn't sound scary, until you change installation to use InnoSetup, which uses a Pascal/Delphi(?) core, which has FPU exceptions enabled by default. Took me a while to figure out.
So, all in all, nothing serious, though I'd prefer not to have to consider NaNs that often.
I'd use Nullable types as often as possible, unless they are (proven to be) performance / ressource constraints. One case could be large vectors / matrices with occasional NaNs, or large sets of named individual values where the default NaN behavior is correct.
Alternatively, you can use an index vector for vectors and matrices, standard "sparse matrix" implementations, or a separate bool/bit vector.
Partial answer:
Float and Double provide NaN (Not a Number). NaN is a little tricky since, per spec, NaN != NaN. If you want to know if a number is NaN, you'll need to use Double.IsNaN().
See also Binary floating point and .NET.
Maybe the significant performance decrease happens when calling one of Nullable's members or properties (boxing).
Try to use a struct with the double + a boolean telling whether the value is specified or not.
One can avoid some of the performance degradation associated with Nullable<T> by defining your own structure
struct MaybeValid<T>
{
public bool isValue;
public T Value;
}
If desired, one may define constructor, or a conversion operator from T to MaybeValid<T>, etc. but overuse of such things may yield sub-optimal performance. Exposed-field structs can be efficient if one avoids unnecessary data copying. Some people may frown upon the notion of exposed fields, but they can be massively more efficient that properties. If a function that will return a T would need to have a variable of type T to hold its return value, using a MaybeValid<Foo> simply increases by 4 the size of thing to be returned. By contrast, using a Nullable<Foo> would require that the function first compute the Foo and then pass a copy of it to the constructor for the Nullable<Foo>. Further, returning a Nullable<Foo> will require that any code that wants to use the returned value must make at least one extra copy to a storage location (variable or temporary) of type Foo before it can do anything useful with it. By contrast, code can use the Value field of a variable of type Foo about as efficiently as any other variable.
In Jesse Liberty's Learning C# book, he says "Objects of one type can be converted into objects of another type. This is called casting."
If you investigate the IL generated from the code below, you can clearly see that the casted assignment isn't doing the same thing as the converted assignment. In the former, you can see the boxing/unboxing occurring; in the latter you can see a call to a convert method.
I know in the end it may be just a silly semantic difference--but is casting just another word for converting. I don't mean to be snarky, but I'm not interested in anyone's gut feeling on this--opinions don't count here! Can anyone point to a definitive reference that confirms or denies if casting and converting are the same thing?
object x;
int y;
x = 4;
y = ( int )x;
y = Convert.ToInt32( x );
Thank you
rp
Note added after Matt's comment about explicit/implicit:
I don't think implicit/explicit is the difference. In the code I posted, the change is explicit in both cases. An implicit conversion is what occurs when you assign a short to an int.
Note to Sklivvz:
I wanted confirmation that my suspicion of the looseness of Jesse Liberty's (otherwise usually lucid and clear) language was correct. I thought that Jesse Liberty was being a little loose with his language. I understand that casting is routed in object hierarchy--i.e., you can't cast from an integer to a string but you could cast from custom exception derived from System.Exception to a System.Exception.
It's interesting, though, that when you do try to cast from an int to a string the compiler tells you that it couldn't "convert" the value. Maybe Jesse is more correct than I thought!
Absolutely not!
Convert tries to get you an Int32 via "any means possible". Cast does nothing of the sort. With cast you are telling the compiler to treat the object as Int, without conversion.
You should always use cast when you know (by design) that the object is an Int32 or another class that has an casting operator to Int32 (like float, for example).
Convert should be used with String, or with other classes.
Try this
static void Main(string[] args)
{
long l = long.MaxValue;
Console.WriteLine(l);
byte b = (byte) l;
Console.WriteLine(b);
b = Convert.ToByte(l);
Console.WriteLine(b);
}
Result:
9223372036854775807
255
Unhandled Exception:
System.OverflowException: Value is
greater than Byte.MaxValue or less
than Byte.MinValue at
System.Convert.ToByte (Int64 value)
[0x00000] at Test.Main
(System.String[] args) [0x00019] in
/home/marco/develop/test/Exceptions.cs:15
The simple answer is: it depends.
For value types, casting will involve genuinely converting it to a different type. For instance:
float f = 1.5f;
int i = (int) f; // Conversion
When the casting expression unboxes, the result (assuming it works) is usually just a copy of what was in the box, with the same type. There are exceptions, however - you can unbox from a boxed int to an enum (with an underlying type of int) and vice versa; likewise you can unbox from a boxed int to a Nullable<int>.
When the casting expression is from one reference type to another and no user-defined conversion is involved, there's no conversion as far as the object itself is concerned - only the type of the reference "changes" - and that's really only the way that the value is regarded, rather than the reference itself (which will be the same bits as before). For example:
object o = "hello";
string x = (string) o; // No data is "converted"; x and o refer to the same object
When user-defined conversions get involved, this usually entails returning a different object/value. For example, you could define a conversion to string for your own type - and
this would certainly not be the same data as your own object. (It might be an existing string referred to from your object already, of course.) In my experience user-defined conversions usually exist between value types rather than reference types, so this is rarely an issue.
All of these count as conversions in terms of the specification - but they don't all count as converting an object into an object of a different type. I suspect this is a case of Jesse Liberty being loose with terminology - I've noticed that in Programming C# 3.0, which I've just been reading.
Does that cover everything?
The best explanation that I've seen can be seen below, followed by a link to the source:
"... The truth is a bit more complex than that. .NET provides
three methods of getting from point A to point B, as it were.
First, there is the implicit cast. This is the cast that doesn't
require you to do anything more than an assignment:
int i = 5;
double d = i;
These are also called "widening conversions" and .NET allows you to
perform them without any cast operator because you could never lose any
information doing it: the possible range of valid values of a double
encompasses the range of valid values for an int and then some, so
you're never going to do this assignment and then discover to your
horror that the runtime dropped a few digits off your int value. For
reference types, the rule behind an implicit cast is that the cast
could never throw an InvalidCastException: it is clear to the compiler
that the cast is always valid.
You can make new implicit cast operators for your own types (which
means that you can make implicit casts that break all of the rules, if
you're stupid about it). The basic rule of thumb is that an implicit
cast can never include the possibility of losing information in the
transition.
Note that the underlying representation did change in this
conversion: a double is represented completely differently from an int.
The second kind of conversion is an explicit cast. An explicit cast is
required wherever there is the possibility of losing information, or
there is a possibility that the cast might not be valid and thus throw
an InvalidCastException:
double d = 1.5;
int i = (int)d;
Here you are obviously going to lose information: i will be 1 after the
cast, so the 0.5 gets lost. This is also known as a "narrowing"
conversion, and the compiler requires that you include an explicit cast
(int) to indicate that yes, you know that information may be lost, but
you don't care.
Similarly, with reference types the compiler requires explicit casts in
situations in which the cast may not be valid at run time, as a signal
that yes, you know there's a risk, but you know what you're doing.
The third kind of conversion is one that involves such a radical change
in representation that the designers didn't provide even an explicit
cast: they make you call a method in order to do the conversion:
string s = "15";
int i = Convert.ToInt32(s);
Note that there is nothing that absolutely requires a method call here.
Implicit and explicit casts are method calls too (that's how you make
your own). The designers could quite easily have created an explicit
cast operator that converted a string to an int. The requirement that
you call a method is a stylistic choice rather than a fundamental
requirement of the language.
The stylistic reasoning goes something like this: String-to-int is a
complicated conversion with lots of opportunity for things going
horribly wrong:
string s = "The quick brown fox";
int i = Convert.ToInt32(s);
As such, the method call gives you documentation to read, and a broad
hint that this is something more than just a quick cast.
When designing your own types (particularly your own value types), you
may decide to create cast operators and conversion functions. The lines
dividing "implicit cast", "explicit cast", and "conversion function"
territory are a bit blurry, so different people may make different
decisions as to what should be what. Just try to keep in mind
information loss, and potential for exceptions and invalid data, and
that should help you decide."
Bruce Wood, November 16th 2005
http://bytes.com/forum/post1068532-4.html
Casting involves References
List<int> myList = new List<int>();
//up-cast
IEnumerable<int> myEnumerable = (IEnumerable<int>) myList;
//down-cast
List<int> myOtherList = (List<int>) myEnumerable;
Notice that operations against myList, such as adding an element, are reflected in myEnumerable and myOtherList. This is because they are all references (of varying types) to the same instance.
Up-casting is safe. Down-casting can generate run-time errors if the programmer has made a mistake in the type. Safe down-casting is beyond the scope of this answer.
Converting involves Instances
List<int> myList = new List<int>();
int[] myArray = myList.ToArray();
myList is used to produce myArray. This is a non-destructive conversion (myList works perfectly fine after this operation). Also notice that operations against myList, such as adding an element, are not reflected in myArray. This is because they are completely seperate instances.
decimal w = 1.1m;
int x = (int)w;
There are operations using the cast syntax in C# that are actually conversions.
Semantics aside, a quick test shows they are NOT equivalent !
They do the task differently (or perhaps, they do different tasks).
x=-2.5 (int)x=-2 Convert.ToInt32(x)=-2
x=-1.5 (int)x=-1 Convert.ToInt32(x)=-2
x=-0.5 (int)x= 0 Convert.ToInt32(x)= 0
x= 0.5 (int)x= 0 Convert.ToInt32(x)= 0
x= 1.5 (int)x= 1 Convert.ToInt32(x)= 2
x= 2.5 (int)x= 2 Convert.ToInt32(x)= 2
Notice the x=-1.5 and x=1.5 cases.
A cast is telling the compiler/interperter that the object in fact is of that type (or has a basetype/interface of that type). It's a pretty fast thing to do compared to a convert where it's no longer the compiler/interperter doing the job but a function actualling parsing a string and doing math to convert to a number.
Casting always means changing the data type of an object. This can be done for instance by converting a float value into an integer value, or by reinterpreting the bits. It is usally a language-supported (read: compiler-supported) operation.
The term "converting" is sometimes used for casting, but it is usually done by some library or your own code and does not necessarily result in the same as casting. For example, if you have an imperial weight value and convert it to metric weight, it may stay the same data type (say, float), but become a different number. Another typical example is converting from degrees to radian.
In a language- / framework-agnostic manner of speaking, converting from one type or class to another is known as casting. This is true for .NET as well, as your first four lines show:
object x;
int y;
x = 4;
y = ( int )x;
C and C-like languages (such as C#) use the (newtype)somevar syntax for casting. In VB.NET, for example, there are explicit built-in functions for this. The last line would be written as:
y = CInt(x)
Or, for more complex types:
y = CType(x, newtype)
Where 'C' obviously is short for 'cast'.
.NET also has the Convert() function, however. This isn't a built-in language feature (unlike the above two), but rather one of the framework. This becomes clearer when you use a language that isn't necessarily used together with .NET: they still very likely have their own means of casting, but it's .NET that adds Convert().
As Matt says, the difference in behavior is that Convert() is more explicit. Rather than merely telling the compiler to treat y as an integer equivalent of x, you are specifically telling it to alter x in such a way that is suitable for the integer class, then assign the result to y.
In your particular case, the casting does what is called 'unboxing', whereas Convert() will actually get the integer value. The result will appear the same, but there are subtle differences better explained by Keith.
According to Table 1-7 titled "Methods for Explicit Conversion" on page 55 in Chapter 1, Lesson 4 of MCTS Self-Paced Training Kit (Exam 70-536): Microsoft® .NET Framework 2.0—Application Development Foundation, there is certainly a difference between them.
System.Convert is language-independent and converts "Between types that implement the System.IConvertible interface."
(type) cast operator is a C#-specific language feature that converts "Between types that define conversion operators."
Furthermore, when implementing custom conversions, advice differs between them.
Per the section titled How to Implement Conversion in Custom Types on pp. 56-57 in the lesson cited above, conversion operators (casting) are meant for simplifying conversions between numeric types, whereas Convert() enables culture-specific conversions.
Which technique you choose depends on the type of conversion you want to perform:
Define conversion operators to simplify narrowing and widening
conversions between numeric types.
Implement System.IConvertible to enable conversion through
System.Convert. Use this technique to enable culture-specific conversions.
...
It should be clearer now that since the cast conversion operator is implemented separately from the IConvertible interface, that Convert() is not necessarily merely another name for casting. (But I can envision where one implementation may refer to the other to ensure consistency).
Casting is essentially just telling the runtime to "pretend" the object is the new type. It doesn't actually convert or change the object in any way.
Convert, however, will perform operations to turn one type into another.
As an example:
char caster = '5';
Console.WriteLine((int)caster);
The output of those statements will be 53, because all the runtime did is look at the bit pattern and treat it as an int. What you end up getting is the ascii value of the character 5, rather than the number 5.
If you use Convert.ToInt32(caster) however, you will get 5 because it actually reads the string and modifies it correctly. (Essentially it knows that ASCII value 53 is really the integer value 5.)
The difference there is whether the conversion is implicit or explicit. The first one up there is a cast, the second one is a more explicit call to a function that converts. They probably go about doing the same thing in different ways.