Why was IEnumerable<T> made covariant in C# 4?

Why was IEnumerable<T> made covariant in C# 4? - c#

In earlier versions of C# IEnumerable was defined like this:
public interface IEnumerable<T> : IEnumerable
Since C# 4 the definition is:
public interface IEnumerable<out T> : IEnumerable
Is it just to make the annoying casts in LINQ expressions go away?
Won't this introduce the same problems like with string[] <: object[] (broken array variance) in C#?
How was the addition of the covariance done from a compatibility point of view? Will earlier code still work on later versions of .NET or is recompilation necessary here? What about the other way around?
Was previous code using this interface strictly invariant in all cases or is it possible that certain use cases will behave different now?

Marc's and CodeInChaos's answers are pretty good, but just to add a few more details:
First off, it sounds like you are interested in learning about the design process we went through to make this feature. If so, then I encourage you to read my lengthy series of articles that I wrote while designing and implementing the feature. Start from the bottom of the page:
Covariance and contravariance blog posts
Is it just to make the annoying casts in LINQ expressions go away?
No, it is not just to avoid Cast<T> expressions, but doing so was one of the motivators that encouraged us to do this feature. We realized that there would be an uptick in the number of "why can't I use a sequence of Giraffes in this method that takes a sequence of Animals?" questions, because LINQ encourages the use of sequence types. We knew that we wanted to add covariance to IEnumerable<T> first.
We actually considered making IEnumerable<T> covariant even in C# 3 but decided that it would be strange to do so without introducing the whole feature for anyone to use.
Won't this introduce the same problems like with string[] <: object[] (broken array variance) in C#?
It does not directly introduce that problem because the compiler only allows variance when it is known to be typesafe. However, it does preserve the broken array variance problem. With covariance, IEnumerable<string[]> is implicitly convertible to IEnumerable<object[]>, so if you have a sequence of string arrays, you can treat that as a sequence of object arrays, and then you have the same problem as before: you can try to put a Giraffe into that string array and get an exception at runtime.
How was the addition of the covariance done from a compatibility point of view?
Carefully.
Will earlier code still work on later versions of .NET or is recompilation necessary here?
Only one way to find out. Try it and see what fails!
It's often a bad idea to try to force code compiled against .NET X to run against .NET Y if X != Y, regardless of changes to the type system.
What about the other way around?
Same answer.
Is it possible that certain use cases will behave different now?
Absolutely. Making an interface covariant where it was invariant before is technically a "breaking change" because it can cause working code to break. For example:
if (x is IEnumerable<Animal>)
ABC();
else if (x is IEnumerable<Turtle>)
DEF();
When IE<T> is not covariant, this code chooses either ABC or DEF or neither. When it is covariant, it never chooses DEF anymore.
Or:
class B { public void M(IEnumerable<Turtle> turtles){} }
class D : B { public void M(IEnumerable<Animal> animals){} }
Before, if you called M on an instance of D with a sequence of turtles as the argument, overload resolution chooses B.M because that is the only applicable method. If IE is covariant, then overload resolution now chooses D.M because both methods are applicable, and an applicable method on a more-derived class always beats an applicable method on a less-derived class, regardless of whether the argument type match is exact or not.
Or:
class Weird : IEnumerable<Turtle>, IEnumerable<Banana> { ... }
class B
{
public void M(IEnumerable<Banana> bananas) {}
}
class D : B
{
public void M(IEnumerable<Animal> animals) {}
public void M(IEnumerable<Fruit> fruits) {}
}
If IE is invariant then a call to d.M(weird) resolves to B.M. If IE suddenly becomes covariant then both methods D.M are applicable, both are better than the method on the base class, and neither is better than the other, so, overload resolution becomes ambiguous and we report an error.
When we decided to make these breaking changes, we were hoping that (1) the situations would be rare, and (2) when situations like this arise, almost always it is because the author of the class is attempting to simulate covariance in a language that doesn't have it. By adding covariance directly, hopefully when the code "breaks" on recompilation, the author can simply remove the crazy gear trying to simulate a feature that now exists.

In order:
Is it just to make the annoying casts in LINQ expressions go away?
It makes things behave like people generally expect ;p
Won't this introduce the same problems like with string[] <: object[] (broken array variance) in C#?
No; since it doesn't expose any Add mechanism or similar (and can't; out and in are enforced at the compiler)
How was the addition of the covariance done from a compatibility point of view?
The CLI already supported it, this merely makes C# (and some of the existing BCL methods) aware of it
Will earlier code still work on later versions of .NET or is recompilation necessary here?
It is entirely backwards compatible, however: C# that relies on C# 4.0 variance won't compile in a C# 2.0 etc compiler
What about the other way around?
That is not unreasonable
Was previous code using this interface strictly invariant in all cases or is it possible that certain use cases will behave different now?
Some BCL calls (IsAssignableFrom) may return differently now

Is it just to make the annoying casts in LINQ expressions go away?
Not only when using LINQ. It's useful everywhere you have an IEnumerable<Derived> and the code expects a IEnumerable<Base>.
Won't this introduce the same problems like with string[] <: object[] (broken array variance) in C#?
No, because covariance is only allowed on interfaces that return values of that type, but don't accept them. So it's safe.
How was the addition of the covariance done from a compatibility point of view? Will earlier code still work on later versions of .NET or is recompilation necessary here? What about the other way around?
I think already compiled code will mostly work as is. Some runtime type-checks (is, IsAssignableFrom, ...) will return true where they returned false earlier.
Was previous code using this interface strictly invariant in all cases or is it possible that certain use cases will behave different now?
Not sure what you mean by that
The biggest problems are related to overload resolution. Since now additional implicit conversions are possible a different overload might be chosen.
void DoSomething(IEnumerabe<Base> bla);
void DoSomething(object blub);
IEnumerable<Derived> values = ...;
DoSomething(values);
But of course, if these overload behave differently, the API is already badly designed.

Related

c# assign a derived class to a base generic variable [duplicate]

What is the real reason for that limitation? Is it just work that had to be done? Is it conceptually hard? Is it impossible?
Sure, one couldn't use the type parameters in fields, because they are allways read-write. But that can't be the answer, can it?
The reason for this question is that I'm writing an article on variance support in C# 4, and I feel that I should explain why it is restricted to delegates and interfaces. Just to inverse the onus of proof.
Update:
Eric asked about an example.
What about this (don't know if that makes sense, yet :-))
public class Lookup<out T> where T : Animal {
public T Find(string name) {
Animal a = _cache.FindAnimalByName(name);
return a as T;
}
}
var findReptiles = new Lookup<Reptile>();
Lookup<Animal> findAnimals = findReptiles;
The reason for having that in one class could be the cache that is held in the class itself. And please don't name your different type pets the same!
BTW, this brings me to optional type parameters in C# 5.0 :-)
Update 2: I'm not claiming the CLR and C# should allow this. Just trying to understand what led to that it doesnt.

First off, as Tomas says, it is not supported in the CLR.
Second, how would that work? Suppose you have
class C<out T>
{ ... how are you planning on using T in here? ... }
T can only be used in output positions. As you note, the class cannot have any field of type T because the field could be written to. The class cannot have any methods that take a T, because those are logically writes. Suppose you had this feature -- how would you take advantage of it?
This would be useful for immutable classes if we could, say, make it legal to have a readonly field of type T; that way we'd massively cut down on the likelihood that it be improperly written to. But it's quite difficult to come up with other scenarios that permit variance in a typesafe manner.
If you have such a scenario, I'd love to see it. That would be points towards someday getting this implemented in the CLR.
UPDATE: See
Why isn't there generic variance for classes in C# 4.0?
for more on this question.

As far as I know, this feature isn't supported by CLR, so adding this would require significant work on the CLR side as well. I believe that co- and contra-variance for interfaces and delegates was actually supported on CLR before the version 4.0, so this was a relatively straightforward extension to implement.
(Supporting this feature for classes would be definitely useful, though!)

If they were permitted, useful 100% type-safe (no internal typecasts) classes or structures could be defined which were covariant with regard to their type T, if their constructor accepted one or more T's or T supplier's. Useful, 100%-type-safe classes or structures could be defined which were contravariant with respect to T if their constructors accepted one or more T consumers. I'm not sure there's much advantage of a class over an interface, beyond the ability to use "new" rather than using a static factory method (most likely from a class whose name is similar to that of the interface), but I can certainly see usage cases for having immutable structures support covariance.

In C# and also Java, what's the relationship between Object[] and String[]?

I recently started to think of this problem and I can't find the answer.
The following code compiles and executes as expected
object[] test = new string[12];
However, I don't know why.
I mean, should we consider string[] as the derived class of object[]?
I think in C#, every array is an instance of Array class. If Array is generic, it should be Array<T>, and Array<string> can be assigned to Array<object>, it doesn't make sense. I remember only interface can use in/out keyword.
And in Java, I'm not sure, but still feel weird. Why different types of references can be possibly assigned to each other when they don't have super-sub class relationship?
Can somebody explain a little?
Thanks a lot!

It's because reference type arrays support covariance in both Java and C#. It also means that every write into a reference type array has to be checked at execution time, to make sure you don't write the wrong type of element into it :(
Don't forget that both Java and C# (and .NET in general) started off without generics. If they had had generics to start with, life could have been somewhat different.
Note that both Java and C# support generic variance now, but in rather different ways. So for example in C# 4 you can write:
IEnumerable<string> strings = // Get some string sequence here
IEnumerable<object> objects = strings;
but you can't write
IList<string> strings = // Get some string list here
// Compile-time error: IList<T> isn't covariant in T
IList<object> objects = strings;
This wouldn't be safe, because you can add to an IList<T> as well as taking items from it.
This is a big topic - for more details, see Eric Lippert's blog series.

In C# there is (and always been) covariance of arrays of reference-types. It still is a string[], but you can legally cast it to an object[] (and access values as you would expect).
But try putting in an int (or any other non-string value) and you'll see that it still behaves appropriately (i.e. doesn't let you).

This is because object is the parent (or the superclass) for all other classes. Search for boxing/ unboxing for more data.

Since all the really smart guys are talking about covariance and contravariance and I couldn't for the life of me explain (or understand) this stuff, listen to Eric Lippert:
Covariance and Contravariance FAQ

Covariance and Contravariance inference in C# 4.0

When we define our interfaces in C# 4.0, we are allowed to mark each of the generic parameters as in or out. If we try to set a generic parameter as out and that'd lead to a problem, the compiler raises an error, not allowing us to do that.
Question:
If the compiler has ways of inferring what are valid uses for both covariance (out) and contravariance(in), why do we have to mark interfaces as such? Wouldn't it be enough to just let us define the interfaces as we always did, and when we tried to use them in our client code, raise an error if we tried to use them in an un-safe way?
Example:
interface MyInterface<out T> {
T abracadabra();
}
//works OK
interface MyInterface2<in T> {
T abracadabra();
}
//compiler raises an error.
//This makes me think that the compiler is cappable
//of understanding what situations might generate
//run-time problems and then prohibits them.
Also,
isn't it what Java does in the same situation? From what I recall, you just do something like
IMyInterface<? extends whatever> myInterface; //covariance
IMyInterface<? super whatever> myInterface2; //contravariance
Or am I mixing things?
Thanks

If the compiler has ways of inferring what are valid uses for both covariance (out) and contravariance(in), why do we have to mark interfaces as such?
I'm not quite sure I understand the question. I think you're asking two things.
1) Can the compiler deduce the variance annotations?
and
2) Why does C# not support call-site variance like Java does?
The answer to the first is:
interface IRezrov<V, W>
{
IRezrov<V, W> Rezrov(IRezrov<W, V> x);
}
I invite you to attempt to deduce what all legal possible variance annotations are on V and W. You might get a surprise.
If you cannot figure out a unique best variance annotation for this method, why do you think the compiler can?
More reasons here:
http://blogs.msdn.com/ericlippert/archive/2007/10/29/covariance-and-contravariance-in-c-part-seven-why-do-we-need-a-syntax-at-all.aspx
More generally: your question indicates fallacious reasoning. The ability to cheaply check whether a solution is correct does not logically imply that there is a cheap way of finding a correct solution. For example, a computer can easily verify whether p * q == r is true or false for two thousand-digit prime numbers p and q. That does not imply that it is easy to take r and find p and q such that the equality is satisfied. The compiler can easily check whether a variance annotation is correct or incorrect; that does not mean that it can find a correct variance annotation amongst the potentially billions of possible annotations.
The answer to the second is: C# isn't Java.

OK, here is the answer to what I asked (from Eric's answer) : http://blogs.msdn.com/ericlippert/archive/2007/10/29/covariance-and-contravariance-in-c-part-seven-why-do-we-need-a-syntax-at-all.aspx
First, it seems to me that variance
ought to be something that you
deliberately design into your
interface or delegate. Making it just
start happening with no control by the
user works against that goal, and also
can introduce breaking changes. (More
on those in a later post!)
Doing so automagically also means that
as the development process goes on and
methods are added to interfaces, the
variance of the interface may change
unexpectedly. This could introduce
unexpected and far-reaching changes
elsewhere in the program.
I decided to put it out explicitly here because although his link does have the answer to my question, the post itself does not.

Why does C# (4.0) not allow co- and contravariance in generic class types?

What is the real reason for that limitation? Is it just work that had to be done? Is it conceptually hard? Is it impossible?
Sure, one couldn't use the type parameters in fields, because they are allways read-write. But that can't be the answer, can it?
The reason for this question is that I'm writing an article on variance support in C# 4, and I feel that I should explain why it is restricted to delegates and interfaces. Just to inverse the onus of proof.
Update:
Eric asked about an example.
What about this (don't know if that makes sense, yet :-))
public class Lookup<out T> where T : Animal {
public T Find(string name) {
Animal a = _cache.FindAnimalByName(name);
return a as T;
}
}
var findReptiles = new Lookup<Reptile>();
Lookup<Animal> findAnimals = findReptiles;
The reason for having that in one class could be the cache that is held in the class itself. And please don't name your different type pets the same!
BTW, this brings me to optional type parameters in C# 5.0 :-)
Update 2: I'm not claiming the CLR and C# should allow this. Just trying to understand what led to that it doesnt.

First off, as Tomas says, it is not supported in the CLR.
Second, how would that work? Suppose you have
class C<out T>
{ ... how are you planning on using T in here? ... }
T can only be used in output positions. As you note, the class cannot have any field of type T because the field could be written to. The class cannot have any methods that take a T, because those are logically writes. Suppose you had this feature -- how would you take advantage of it?
This would be useful for immutable classes if we could, say, make it legal to have a readonly field of type T; that way we'd massively cut down on the likelihood that it be improperly written to. But it's quite difficult to come up with other scenarios that permit variance in a typesafe manner.
If you have such a scenario, I'd love to see it. That would be points towards someday getting this implemented in the CLR.
UPDATE: See
Why isn't there generic variance for classes in C# 4.0?
for more on this question.

As far as I know, this feature isn't supported by CLR, so adding this would require significant work on the CLR side as well. I believe that co- and contra-variance for interfaces and delegates was actually supported on CLR before the version 4.0, so this was a relatively straightforward extension to implement.
(Supporting this feature for classes would be definitely useful, though!)

If they were permitted, useful 100% type-safe (no internal typecasts) classes or structures could be defined which were covariant with regard to their type T, if their constructor accepted one or more T's or T supplier's. Useful, 100%-type-safe classes or structures could be defined which were contravariant with respect to T if their constructors accepted one or more T consumers. I'm not sure there's much advantage of a class over an interface, beyond the ability to use "new" rather than using a static factory method (most likely from a class whose name is similar to that of the interface), but I can certainly see usage cases for having immutable structures support covariance.

Does the C# 4.0 "dynamic" keyword make Generics redundant?

I'm very excited about the dynamic features in C# (C#4 dynamic keyword - why not?), especially because in certain Library parts of my code I use a lot of reflection.
My question is twofold:
1. does "dynamic" replace Generics, as in the case below?
Generics method:
public static void Do_Something_If_Object_Not_Null<SomeType>(SomeType ObjToTest) {
//test object is not null, regardless of its Type
if (!EqualityComparer<SomeType>.Default.Equals(ObjToTest, default(SomeType))) {
//do something
}
}
dynamic method(??):
public static void Do_Something_If_Object_Not_Null(dynamic ObjToTest) {
//test object is not null, regardless of its Type?? but how?
if (ObjToTest != null) {
//do something
}
}
2. does "dynamic" now allow for methods to return Anonymous types, as in the case below?:
public static List<dynamic> ReturnAnonymousType() {
return MyDataContext.SomeEntities.Entity.Select(e => e.Property1, e.Property2).ToList();
}
cool, cheers
EDIT:
Having thought through my question a little more, and in light of the answers, I see I completely messed up the main generic/dynamic question. They are indeed completely different. So yeah, thanks for all the info.
What about point 2 though?

dynamic might simplify a limited number of reflection scenarios (where you know the member-name up front, but there is no interface) - in particular, it might help with generic operators (although other answers exist) - but other than the generic operators trick, there is little crossover with generics.
Generics allow you to know (at compile time) about the type you are working with - conversely, dynamic doesn't care about the type.
In particular - generics allow you to specify and prove a number of conditions about a type - i.e. it might implement some interface, or have a public parameterless constructor. dynamic doesn't help with either: it doesn't support interfaces, and worse than simply not caring about interfaces, it means that we can't even see explicit interface implementations with dynamic.
Additionally, dynamic is really a special case of object, so boxing comes into play, but with a vengence.
In reality, you should limit your use of dynamic to a few cases:
COM interop
DLR interop
maybe some light duck typing
maybe some generic operators
For all other cases, generics and regular C# are the way to go.

To answer your question. No.
Generics gives you "algorithm reuse" - you write code independent of a data Type. the dynamic keyword doesn't do anything related to this. I define List<T> and then i can use it for List of strings, ints, etc...
Type safety: The whole compile time checking debate. Dynamic variables will not alert you with compile time warnings/errors in case you make a mistake they will just blow up at runtime if the method you attempt to invoke is missing. Static vs Dynamic typing debate
Performance : Generics improves the performance for algorithms/code using Value types by a significant order of magnitude. It prevents the whole boxing-unboxing cycle that cost us pre-Generics. Dynamic doesn't do anything for this too.
What the dynamic keyword would give you is
simpler code (when you are interoperating with Excel lets say..) You don't need to specify the name of the classes or the object model. If you invoke the right methods, the runtime will take care of invoking that method if it exists in the object at that time. The compiler lets you get away even if the method is not defined. However it implies that this will be slower than making a compiler-verified/static-typed method call since the CLR would have to perform checks before making a dynamic var field/method invoke.
The dynamic variable can hold different types of objects at different points of time - You're not bound to a specific family or type of objects.

To answer your first question, generics are resolved compile time, dynamic types at runtime. So there is a definite difference in type safety and speed.

Dynamic classes and Generics are completely different concepts. With generics you define types at compile time. They don't change, they are not dynamic. You just put a "placeholder" to some class or method to make the calling code define the type.
Dynamic methods are defined at runtime. You don't have compile-time type safety there. The dynamic class is similar as if you have object references and call methods by its string names using reflection.

Answer to the second question: You can return anonymous types in C# 3.0. Cast the type to object, return it and use reflection to access it's members. The dynamic keyword is just syntactic sugar for that.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.