Correct me if im wrong but while doing a foreach an IEnumerable<T> creates garbage no matter what T is. But I'm wondering if you have a List<T> where T is Entity. Then say there is a derived class in the list like Entity2D. Will it have to create a new enumerator for each derived class? Therefore creating garbage?
Also does having an interface let's say IEntity as T create garbage?
List<T>'s GetEnumerator method actually is quite efficient.
When you loop through the elements of a List<T>, it calls GetEnumerator. This, in turn, generates an internal struct which holds a reference to the original list, an index, and a version ID to track for changes in the list.
However, since a struct is being used, it's really not creating "garbage" that the GC will ever deal with.
As for "create a new enumerator for each derived class" - .NET generics works differently than C++ templates. In .NET, the List<T> class (and it's internal Enumerator<T> struct) is defined one time, and usable for any T. When used, a generic type for that specific type of T is required, but this is only the type information for that newly created type, and quite small in general. This differs from C++ templates, for example, where each type used is created at compile time, and "built in" to the executable.
In .NET, the executable specifies the definition for List<T>, not List<int>, List<Entity2D>, etc...
I think you may be interested in this article which explains why List(T) will not create "garbage", as opposed to Collection(T):
Now, here comes the tricky part. Rumor has it that many of the types in System.Collections.Generic will not allocate an enumerator when using foreach. List's GetEnumerator, for example, returns a struct, which will just sit on the stack. Look for yourself with .NET Reflector, if you don't believe me. To prove to myself that a foreach over a List doesn't cause any heap allocations, I changed entities to be a List, did the exact same foreach loop, and ran the profiler. No enumerator!
[...]
However, there is definitely a caveat to the above. Foreach loops over Lists can still generate garbage. [Casting List to IEnumerable] Even though we're still doing a foreach over a List, when the list is cast to an interface, the value type enumerator must be boxed, and placed on the heap.
An interesting note: as Reed Copsey pointed out, the List<T>.Enumerator type is actually a struct. This is both good and horrible.
It's good in the sense that calling foreach on a List<T> actually doesn't create garbage, as no new reference type objects are allocated for the garbage collector to worry about.
It's horrible in the sense that suddenly the return value of GetEnumerator is a value type, against almost every .NET developer's intuition (it is generally expected that GetEnumerator will return a non-descript IEnumerator<T>, as this is what is guaranteed by the IEnumerable<T> contract; List<T> gets around this by explicitly implementing IEnumerable<T>.GetEnumerator and publicly exposing a more specific implementation of IEnumerator<T> which happens to be a value type).
So any code that, for example, passes a List<T>.Enumerator to a separate method which in turn calls MoveNext on that enumerator object, faces the potential issue of an infinite loop. Like this:
int CountListMembers<T>(List<T> list)
{
using (var e = list.GetEnumerator())
{
int count = 0;
while (IncrementEnumerator(e, ref count)) { }
return count;
}
}
bool IncrementEnumerator<T>(IEnumerator<T> enumerator, ref int count)
{
if (enumerator.MoveNext())
{
++count;
return true;
}
return false;
}
The above code is very stupid; it's only meant as a trivial example of one scenario in which the return value of List<T>.GetEnumerator can cause highly unintuitive (and potentially disastrous) behavior.
But as I said, it's still kind of good in that it doesn't create garbage ;)
Regardless of whether it's a List<Entity>, List<Entity2D>, or List<IEntity>, GetEnumerator will be called once per foreach. Further, it is irrelevant whether e.g. List<Entity> contains instances of Entity2D. An IEnumerable<T>'s implementation of GetEnumerator may create reference objects which will be collected. As Reed noted, List<T> in MS .NET avoids this by using only value types.
The class List<T> implements IEnumerator<T> explicitly, so that calling GetEnumerator on a variable of type List<T> will cause it to return a List<T>.Enumerator, which has value-type semantics, whereas calling it on a variable of type IEnumerator<T> which holds a reference to a List<T> will cause it to return a value of type IEnumerator<T>, which will have reference semantics.
Related
Everytime I've had to return a collection, I've returned a List. I've just read that I should return IEnumerable or similar interface (IQueryable for instance).
The problem I see is that often I want to work with a List. To do that, I'd have to do a .ToList() on the returned result.
Example
//...
List<Guid> listOfGuids = MyMethod().ToList();
//...
public IEnumerable<Guid> MyMethod()
{
using (var context = AccesDataRépart.GetNewContextRépart())
{
return context.MyTable.ToList();
}
}
Is executing a .ToList() twice the right practice.
If the caller actually needs a list, return a list (if that's what you have). Returning an IEnumerable when you already have a list, and when you know the caller is going to need a list, is just being wasteful, and for no real benefit.
If you feel that there is a chance that you'll be changing the underlying type of the object you are returning in future versions of the method it can, potentially, make it a bit easier on the library implementer to return an interface instead, but it's easier on the caller of the method when a more derived type is returned (they have the ability to do more with it than if they are just given an interface).
It is the reverse with input parameters. When passing parameters in the more derived the type the more "power" the library implementer has to work with the type, especially in future revisions, but using a much less restrictive type makes life easier on the caller of your library, as they don't need to convert what they have to what your method accepts.
This makes these decisions something to think about a fair bit when writing a libraries public API. You need to consider how much "power" you need right now, as well as how much you think you might need in the future. Once you know how restrictive/general the types need to be for you to do your job, you can then work to make your methods more convenient to use for callers. There is no one answers that will apply in every case. Saying that you should always return IEnumerable instead of List isn't proper, just the same as saying that you should always return List is also improper. You need to make a judgement call based on the specific situation you are in.
I would recommend just returning a List<T>, or perhaps an IList<T>. The reason that someone might recommend against returning List, is that it locks you in to that implementation. Depending on the usage of the API, that might not be a concern.
My general rule of thumb is to be more permissive in what you accept and more specific in what you return. So, IEnumerable<T> for method parameters, and IList<T>, List<T> or possibly even T[] for method return values.
You don't have to call ToList on the returned value, It is already a List. The reason you can't return IEnumerable is that you have using statement around your DataContext it will be disposed. So modify your method return type as List<T> and then don't call ToList on the returned value.
//...
List<Guid> listOfGuids = MyMethod(); //No ToList here
//...
public List<Guid> MyMethod()
{
using (var context = AccesDataRépart.GetNewContextRépart())
{
return context.MyTable.ToList();
}
}
I've just read that I should return IEnumerable or similar interface
(IQueryable for instance).
Don't worry about that - return IList<> or List<> if you actually need a list object at the point the collection is consumed. The problem with returning IEnumerable can be that no-one knows what the cost of enumerating it is going to be - which is a down-side to the whole Linq concept that doesn't always get fair mention from the people who are encouraging everyone to return IEnumerable everywhere.
It really depends. Do you want to enumerate the collection before or after returning it?
Enumerate before: Every time you call ToList, ToArray, etc. you are enumerating the IEnumerable. If you are doing this many times after it is returned, this can be redundant and wasteful. Either returning it in an already enumerated form (e.g., IList, Array) or enumerating it once after returned and using that for the future processing probably be more preferable.
Enumerate after: Returning an IEnumerable allows you to defer the enumeration of the collection until later (e.g., save processing up front). If it turns out that you never end up enumerating the collection, or you only enumerate a subset of it, then the IEnumerable approach can be very advantageous.
We all know mutable structs are evil in general. I'm also pretty sure that because IEnumerable<T>.GetEnumerator() returns type IEnumerator<T>, the structs are immediately boxed into a reference type, costing more than if they were simply reference types to begin with.
So why, in the BCL generic collections, are all the enumerators mutable structs? Surely there had to have been a good reason. The only thing that occurs to me is that structs can be copied easily, thus preserving the enumerator state at an arbitrary point. But adding a Copy() method to the IEnumerator interface would have been less troublesome, so I don't see this as being a logical justification on its own.
Even if I don't agree with a design decision, I would like to be able to understand the reasoning behind it.
Indeed, it is for performance reasons. The BCL team did a lot of research on this point before deciding to go with what you rightly call out as a suspicious and dangerous practice: the use of a mutable value type.
You ask why this doesn't cause boxing. It's because the C# compiler does not generate code to box stuff to IEnumerable or IEnumerator in a foreach loop if it can avoid it!
When we see
foreach(X x in c)
the first thing we do is check to see if c has a method called GetEnumerator. If it does, then we check to see whether the type it returns has method MoveNext and property current. If it does, then the foreach loop is generated entirely using direct calls to those methods and properties. Only if "the pattern" cannot be matched do we fall back to looking for the interfaces.
This has two desirable effects.
First, if the collection is, say, a collection of ints, but was written before generic types were invented, then it does not take the boxing penalty of boxing the value of Current to object and then unboxing it to int. If Current is a property that returns an int, we just use it.
Second, if the enumerator is a value type then it does not box the enumerator to IEnumerator.
Like I said, the BCL team did a lot of research on this and discovered that the vast majority of the time, the penalty of allocating and deallocating the enumerator was large enough that it was worth making it a value type, even though doing so can cause some crazy bugs.
For example, consider this:
struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
h = somethingElse;
}
You would quite rightly expect the attempt to mutate h to fail, and indeed it does. The compiler detects that you are trying to change the value of something that has a pending disposal, and that doing so might cause the object that needs to be disposed to actually not be disposed.
Now suppose you had:
struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
h.Mutate();
}
What happens here? You might reasonably expect that the compiler would do what it does if h were a readonly field: make a copy, and mutate the copy in order to ensure that the method does not throw away stuff in the value that needs to be disposed.
However, that conflicts with our intuition about what ought to happen here:
using (Enumerator enumtor = whatever)
{
...
enumtor.MoveNext();
...
}
We expect that doing a MoveNext inside a using block will move the enumerator to the next one regardless of whether it is a struct or a ref type.
Unfortunately, the C# compiler today has a bug. If you are in this situation we choose which strategy to follow inconsistently. The behaviour today is:
if the value-typed variable being mutated via a method is a normal local then it is mutated normally
but if it is a hoisted local (because it's a closed-over variable of an anonymous function or in an iterator block) then the local is actually generated as a read-only field, and the gear that ensures that mutations happen on a copy takes over.
Unfortunately the spec provides little guidance on this matter. Clearly something is broken because we're doing it inconsistently, but what the right thing to do is not at all clear.
Struct methods are inlined when type of struct is known at compile time, and calling method via interface is slow, so answer is: because of performance reason.
I work on applications developed in C#/.NET with Visual Studio. Very often ReSharper, in the prototypes of my methods, advises me to replace the type of my input parameters with more generic ones. For instance, List<> with IEnumerable<> if I only use the list with a foreach in the body of my method. I can understand why it looks smarter to write that but I'm quite concerned with the performance. I fear that the performance of my apps will decrease if I listen to ReSharper...
Can someone explain to me precisely (more or less) what's happening behind the scenes (i.e. in the CLR) when I write:
public void myMethod(IEnumerable<string> list)
{
foreach (string s in list)
{
Console.WriteLine(s);
}
}
static void Main()
{
List<string> list = new List<string>(new string[] {"a", "b", "c"});
myMethod(list);
}
and what is the difference with:
public void myMethod(List<string> list)
{
foreach (string s in list)
{
Console.WriteLine(s);
}
}
static void Main()
{
List<string> list = new List<string>(new string[] {"a", "b", "c"});
myMethod(list);
}
You're worried about performance - but do you have any grounds for that concern? My guess is that you haven't benchmarked the code at all. Always benchmark before replacing readable, clean code with more performant code.
In this case the call to Console.WriteLine will utterly dominate the performance anyway.
While I suspect there may be a theoretical difference in performance between using List<T> and IEnumerable<T> here, I suspect the number of cases where it's significant in real world apps is vanishingly small.
It's not even as if the sequence type is being used for many operations - there's a single call to GetEnumerator() which is declared to return IEnumerator<T> anyway. As the list gets larger, any difference in performance between the two will get even smaller, because it will only have any impact at all at the very start of the loop.
Ignoring the analysis though, the thing to take out of this is to measure performance before you base coding decisions on it.
As for what happens behind the scenes - you'd have to dig into the deep details of exactly what's in the metadata in each case. I suspect that in the case of an interface there's one extra level of redirection, at least in theory - the CLR would have to work out where in the target object's type the vtable for IEnumerable<T> was, and then call into the appropriate method's code. In the case of List<T>, the JIT would know the right offset into the vtable to start with, without the extra lookup. This is just based on my somewhat hazy understanding of JITting, thunking, vtables and how they apply to interfaces. It may well be slightly wrong, but more importantly it's an implementation detail.
You'd have to look at the generated code to be certain, but in this case, I doubt there's much difference. The foreach statement always operates on an IEnumerable or IEnumerable<T>. Even if you specify List<T>, it will still have to get the IEnumerable<T> in order to iterate.
In general, I'd say if you are replace the equivalent non-generic interface by the generic flavour (say IList<> --> IList<T>) you are bound to get better or equivalent performance.
One unique selling point is that because, unlike java, .NET does not use type erasure and supports true value types (struct), one of the main differences would be in how it stores e.g. a List<int> internally. This could quite quickly become a big difference depending on how intensively the List is being used.
A braindead synthetic benchmark showed:
for (int j=0; j<1000; j++)
{
List<int> list = new List<int>();
for (int i = 1<<12; i>0; i--)
list.Add(i);
list.Sort();
}
to be faster by a factor of 3.2x than the semi-equivalent non-generic:
for (int j=0; j<1000; j++)
{
ArrayList list = new ArrayList();
for (int i = 1<<12; i>0; i--)
list.Add(i);
list.Sort();
}
Disclaimer I realize this benchmark is synthetic, it doesn't actually focus on the use of interfaces right there (rather directly dispatches virtual methods calls on a specific type) etc. However, it illustrates the point I'm making. Don't fear generics (at least not for performance reasons).
In general, the increased flexibility will be worth what minor performance difference it would incur.
In the first version (IEnumerable) it is more generic and actually you say the method accepts any argument that implements this interface.
Second version yo restrict the method to accept sepcific class type and this is not recommended at all. And the performance is mostly the same.
The basic reason for this recommendation is creating a method that works on IEnumberable vs. List is future flexibility. If in the future you need to create a MySpecialStringsCollection, you could have it implement the IEnumerable method and still utilize the same method.
Essentially, I think it comes down, unless you're noticing a significant, meaningful performance hit (and I'd be shocked if you noticed any); prefer a more tolerant interface, that will accept more than what you're expecting today.
The definition for List<T> is:
[SerializableAttribute]
public class List<T> : IList<T>, ICollection<T>,
IEnumerable<T>, IList, ICollection, IEnumerable
So List<T> is derived from IList, ICollection, IList<T>, and ICollection<T>, in addition to IEnumerable and IEnumerable<T>.
The IEnumerable interface exposes the GetEnumerator method which returns an IEnumerator, a MoveNext method, and a Current property. These mechanisms are what the List<T> class uses to iterate through the list with foreach and next.
It follows that, if IList, ICollection, IList<T>, and ICollection<T> are not required to do the job, then it's sensible to use IEnumerable or IEnumerable<T> instead, thereby eliminating the additional plumbing.
An interface simply defines the presence and signature of public methods and properties implemented by the class. Since the interface does not "stand on its own", there should be no performance difference for the method itself, and any "casting" penalty - if any - should be almost too small to measure.
There is no performance penalty for a static-upcast. It's a logical construct in program text.
As other people have said, premature optimization is the root of all evil. Write your code, run it through a hotspot analysis before you worry about performance tuning things.
Getting in IEnumerable<> might create some trouble, as you could receive some LINQ expression with differed execution, or yield return. In both cases you won't have a collection but something you could iterate on.
So when you would like to set some boundaries, you could request an array. There is not a problem to call collection.ToArray() before passing parameter, but you'll be sure that there is no hidden differed caveats there.
When I'm writing my DAL or other code that returns a set of items, should I always make my return statement:
public IEnumerable<FooBar> GetRecentItems()
or
public IList<FooBar> GetRecentItems()
Currently, in my code I have been trying to use IEnumerable as much as possible but I'm not sure if this is best practice? It seemed right because I was returning the most generic datatype while still being descriptive of what it does, but perhaps this isn't correct to do.
Framework design guidelines recommend using the class Collection when you need to return a collection that is modifiable by the caller or ReadOnlyCollection for read only collections.
The reason this is preferred to a simple IList is that IList does not inform the caller if its read only or not.
If you return an IEnumerable<T> instead, certain operations may be a little trickier for the caller to perform. Also you no longer will give the caller the flexibility to modify the collection, something that you may or may not want.
Keep in mind that LINQ contains a few tricks up its sleeve and will optimize certain calls based on the type they are performed on. So, for example, if you perform a Count and the underlying collection is a List it will NOT walk through all the elements.
Personally, for an ORM I would probably stick with Collection<T> as my return value.
It really depends on why you are using that specific interface.
For example, IList<T> has several methods that aren't present in IEnumerable<T>:
IndexOf(T item)
Insert(int index, T item)
RemoveAt(int index)
and Properties:
T this[int index] { get; set; }
If you need these methods in any way, then by all means return IList<T>.
Also, if the method that consumes your IEnumerable<T> result is expecting an IList<T>, it will save the CLR from considering any conversions required, thus optimizing the compiled code.
In general, you should require the most generic and return the most specific thing that you can. So if you have a method that takes a parameter, and you only really need what's available in IEnumerable, then that should be your parameter type. If your method could return either an IList or an IEnumerable, prefer returning IList. This ensures that it is usable by the widest range of consumers.
Be loose in what you require, and explicit in what you provide.
That depends...
Returning the least derived type (IEnumerable) will leave you the most leeway to change the underlying implementation down the track.
Returning a more derived type (IList) provides the users of your API with more operations on the result.
I would always suggest returning the least derived type that has all the operations your users are going to need... so basically, you first have to deremine what operations on the result make sense in the context of the API you are defining.
One thing to consider is that if you're using a deferred-execution LINQ statement to generate your IEnumerable<T>, calling .ToList() before you return from your method means that your items may be iterated twice - once to create the List, and once when the caller loops through, filters, or transforms your return value. When practical, I like to avoid converting the results of LINQ-to-Objects to a concrete List or Dictionary until I have to. If my caller needs a List, that's a single easy method call away - I don't need to make that decision for them, and that makes my code slightly more efficient in the cases where the caller is just doing a foreach.
List<T> offers the calling code many more features, such as modifying the returned object and access by index. So the question boils down to: in your application's specific use case, do you WANT to support such uses (presumably by returning a freshly constructed collection!), for the caller's convenience -- or do you want speed for the simple case when all the caller needs is to loop through the collection and you can safely return a reference to a real underlying collection without fearing this will get it erroneously changed, etc?
Only you can answer this question, and only by understanding well what your callers will want to do with the return value, and how important performance is here (how big are the collections you would be copying, how likely is this to be a bottleneck, etc).
I think you can use either, but each has a use. Basically List is IEnumerable but you have
count functionality, add element, remove element
IEnumerable is not efficient for counting elements
If the collection is intended to be readonly, or the modification of the collection is controlled by the Parent then returning an IList just for Count is not a good idea.
In Linq, there is a Count() extension method on IEnumerable<T> which inside the CLR will shortcut to .Count if the underlying type is of IList, so performance difference is negligible.
Generally I feel (opinion) it is better practice to return IEnumerable where possible, if you need to do additions then add these methods to the parent class, otherwise the consumer is then managing the collection within Model which violates the principles, e.g. manufacturer.Models.Add(model) violates law of demeter. Of course these are just guidelines and not hard and fast rules, but until you have full grasps of applicability, following blindly is better than not following at all.
public interface IManufacturer
{
IEnumerable<Model> Models {get;}
void AddModel(Model model);
}
(Note: If using nNHibernate you might need to map to private IList using different accessors.)
It's not so simple when you are talking about return values instead of input parameters. When it's an input parameter, you know exactly what you need to do. So, if you need to be able to iterate over the collection, you take an IEnumberable whereas if you need to add or remove, you take an IList.
In the case of a return value, it's tougher. What does your caller expect? If you return an IEnumerable, then he will not know a priori that he can make an IList out of it. But, if you return an IList, he will know that he can iterate over it. So, you have to take into account what your caller is going to do with the data. The functionality that your caller needs/expects is what should govern when making the decision on what to return.
TL; DR; – summary
If you develop in-house software, do use the specific type(Like List) for the return
values and the most generic type for input parameters even in case of collections.
If a method is a part of a redistributable library’s public API, use
interfaces instead of concrete collection types to introduce both return values and input parameters.
If a method returns a read-only collection, show that by using IReadOnlyList or IReadOnlyCollection as the return value type.
More
as all have said it depends,
if you don't want Add/Remove functioanlity at calling layer then i will vote for IEnumerable as it provides only iteration and basic functionality which in design prespective i like.
Returning IList my votes are always againist it but it's mainly what you like and what not.
in performance terms i think they are more of same.
If you do not counting in your external code it is always better to return IEnumerable, because later you can change your implementation (without external code impact), for example, for yield iterator logic and conserve memory resources (very good language feature by the way).
However if you need items count, don't forget that there is another layer between IEnumerable and IList - ICollection.
I might be a bit off here, seeing that no one else suggested it so far, but why don't you return an (I)Collection<T>?
From what I remember, Collection<T> was the preferred return type over List<T> because it abstracts away the implementation. They all implement IEnumerable, but that sounds to me a bit too low-level for the job.
I think you can use either, but each has a use. Basically List is IEnumerable but you have count functionality, Add element, remove element
IEnumerable is not efficient for counting elements, or getting a specific element in the collection.
List is a collection which is ideally suited to finding specific elements, easy to add elements, or remove them.
Generally I try to use List where possible as this gives me more flexibility.
Use
List<FooBar> getRecentItems()
rather than
IList<FooBar> GetRecentItems()
I think the general rule is to use the more specific class to return, to avoid doing unneeded work and give your caller more options.
That said, I think it's more important to consider the code in front of you which you are writing than the code the next guy will write (within reason.) This is because you can make assumptions about the code that already exists.
Remember that moving UP to a collection from IEnumerable in an interface will work, moving down to IEnumerable from a collection will break existing code.
If these opinions all seem conflicted, it's because the decision is subjective.
IEnumerable<T> contains a small subset of what is inside List<T>, which contains the same stuff as IEnumerable<T> but more! You only use IEnumerable<T> if you want a smaller set of features. Use List<T> if you plan to use a larger, richer set of features.
The Pizza Explanation
Here is a much more comprehensive explanation of why you would use an Interface like IEnumerable<T> versus List<T>, or vise versa, when instantiating objects in C languages like Microsoft C#.
Think of Interfaces like IEnumerable<T> and IList<T> as the individual ingredients in a pizza (pepperoni, mushrooms, black olives...) and concrete classes like List<T> as the pizza. List<T> is in fact a Supreme Pizza that always contains all the Interface ingredients combined (ICollection, IEnumerable, IList, etc).
What you get as far as a pizza and its toppings is determined by how you "type" your list when you create its object reference in memory. You have to declare the type of pizza you are cooking as follows:
// Pepperoni Pizza: This gives you a single Interface's members,
// or a pizza with one topping because List<T> is limited to
// acting like an IEnumerable<T> type.
IEnumerable<string> pepperoniPizza = new List<string>();
// Supreme Pizza: This gives you access to ALL 8 Interface
// members combined or a pizza with ALL the ingredients
// because List type uses all Interfaces!!
IList<string> supremePizza = new List<string>();
Note you cannot instantiate an Interface as itself (or eat raw pepperoni). When you instantiate List<T> as one Interface type like IEnumerable<T> you only have access to its Implementations and get the pepperoni pizza with one topping. You can only access IEnumerable<T> members and cannot see all the other Interface members in List<T>.
When List<T> is instantiated as IList<T> it implements all 8 Interfaces, so it has access to all the members of all the Interfaces it has implemented (or a Supreme Pizza toppings)!
Here is the List<T> class, showing you WHY that is. Notice the List<T> in the .NET Library has implemented all the other Interfaces including IList!! But IEnumerable<T> implements just a small subsection of those List Interface members.
public class List<T> :
ICollection<T>,
IEnumerable<T>,
IEnumerable,
IList<T>,
IReadOnlyCollection<T>,
IReadOnlyList<T>,
ICollection,
IList
{
// List<T> types implement all these goodies and more!
public List();
public List(IEnumerable<T> collection);
public List(int capacity);
public T this[int index] { get; set; }
public int Count { get; }
public int Capacity { get; set; }
public void Add(T item);
public void AddRange(IEnumerable<T> collection);
public ReadOnlyCollection<T> AsReadOnly();
public bool Exists(Predicate<T> match);
public T Find(Predicate<T> match);
public void ForEach(Action<T> action);
public void RemoveAt(int index);
public void Sort(Comparison<T> comparison);
// ......and much more....
}
So why NOT instantiate List<T> as List<T> ALL THE TIME?
Instantiating a List<T> as List<T> gives you access to all Interface members! But you might not need everything. Choosing one Interface type allows your application to store a smaller object with less members and keeps your application tight. Who needs Supreme Pizza every time?
But there is a second reason for using Interface types: Flexibility. Because other types in .NET, including your own custom ones, might use the same "popular" Interface type, it means you can later substitute your List<T> type with any other type that implements, say IEnumerable<T>. If your variable is an Interface type, you can now switch out the object created with something other than List<T>. Dependency Injection is a good example of this type of flexibility using Interfaces rather than concrete types, and why you might want to create objects using Interfaces.
An interview question for a .NET 3.5 job is "What is the difference between an iterator and an enumerator"?
This is a core distinction to make, what with LINQ, etc.
Anyway, what is the difference? I can't seem to find a solid definition on the net. Make no mistake, I can find the meaning of the two terms but I get slightly different answers. What would be the best answer for an interview?
IMO an iterator "iterates" over a collection, and an enumerator provides the functionality to iterate, but this has to be called.
Also, using the yield keyword is said to save state. What exactly is this state? Is there an example of this benefit occurring?
Iterating means repeating some steps, while enumerating means going through all values in a collection of values. So enumerating usually requires some form of iteration.
In that way, enumerating is a special case of iterating where the step is getting a value from a collection.
Note the "usually" – enumerating may also be performed recursively, but recursion and iteration are so closely related that I would not care about this small difference.
You may also enumerate values you do not explicitly store in a collection. For example, you can enumerate the natural number, primes, or whatever but you would calculate these values during the enumeration and not retrieve them from a physical collection. You understand this case as enumerating a virtual collection with its values defined by some logic.
I assume Reed Copsey got the point. In C# there are two major ways to enumerate something.
Implement Enumerable and a class implementing IEnumerator
Implement an iterator with the yield statement
The first way is harder to implement and uses objects for enumerating. The second way is easier to implement and uses continuations.
In C# 2+, iterators are a way for the compiler to automatically generate the IEnumerable and/or IEnumerable<T> interfaces for you.
Without iterators, you would need to create a class implementing IEnumerator, including Current, MoveNext, and Reset. This requires a fair amount of work. Normally, you would create a private class that implemtented IEnumerator<T> for your type, then yourClass.GetEnumerator() would construct that private class, and return it.
Iterators are a way for the compiler to automatically generate this for you, using a simple syntax (yield). This lets you implement GetEnumerator() directly in your class, without a second class (The IEnumerator) being specified by you. The construction of that class, with all of its members, is done for you.
Iterators are very developer friendly - things are done in a very efficient way, with much less effort.
When you use foreach, the two will behave identically (provided you write your custom IEnumerator correctly). Iterators just make life much simpler.
What C# calls an iterator is more commonly (outside of the C# world) called a generator or generator function (e.g. in Python). A generator function is a specialized case of coroutine. A C# iterator (generator) is a special form of an enumerator (a data type implementing the IEnumerable interface).
I dislike this usage of the term iterator for a C# generator because it is just as much an enumerator as it is an iterator. Too late for Microsoft to change its mind though.
For contrast consider that in C++ an iterator is a value which is used primarily to access sequential elements in a collection. It can be advanced, derferenced to retrieve a value, and tested to see whether the end of the collection has been reached.
"Whereas a foreach statement is the consumer of the enumerator, an iterator is the producer of the enumerator."
The above is how "C# 5.0 In A NutShell" explains it, and has been helpful for me.
In other words, the foreach statement uses MoveNext(), and the Current property of the IEnumerator to iterate through a sequence, while the iterator is used to produce the implementation of the IEnumerator that will be used by the foreach statement. In C#, when you write an iterator method containing a yield statement, the compiler will generate a private enumerator for you. And when you iterate through the items in the sequence, it will call the MoveNext() and Current property of the private enumerator. These methods/properties are implemented by your code in the iterator method that will be called repeately to yield values until there are not values left to yield.
This is my understanding of how C# define enumerators, and iterators.
To understand iterators we first need to understand enumerators.
Enumerators are specialist objects which provide one with the means to move through an ordered list of items one at a time (the same kind of thing is sometimes called a ‘cursor’). The .NET framework provides two important interfaces relating to enumerators: IEnumerator and IEnumerable. Objects which implement IEnumerator are themselves enumerators; they support the following members:
the property Current, which points to a position on the list
the method MoveNext, which moves the Current item one along the list
the method Reset, which moves the Current item to its initial position (which is before the first item).
On the other hand, Iterаtors implement the enumerаtor pаttern. .NET 2.0 introduced the iterаtor, which is а compiler-mаnifested enumerаtor. When the enumerаble object cаlls GetEnumerаtor, either directly or indirectly, the compiler generаtes аnd returns аn аppropriаte iterаtor object. Optionаlly, the iterаtor cаn be а combined enumerаble аnd enumerаtor object.
The essentiаl ingredient of аn iterаtor block is the yield stаtement. There is one big difference between iterаtors аnd enumerаtors: Iterаtors do not implement the Reset method. Cаlling the Reset method on аn iterаtor cаuses аn exception.
The point of iterators is to allow the easy implementation of enumerators. Where a method needs to return either an enumerator or an enumerable class for an ordered list of items, it is written so as to return each item in its correct order using the ‘yield’ statement.
Since no examples were given, here is one that was helpful to me.
An enumerator is an object that you get when you call .GetEnumerator() on a class or type that implements the IEnumerator interface. When this interface is implemented, you have created all the code necessary for the compilor to enable you to use foreach to "iterate" over your collection.
Don't get that word 'iterate" confused with iterator though. Both the Enumerator and the iterator allow you to "iterate". Enumerating and iterating are basically the same process, but are implemented differently. Enumerating means you've impleneted the IEnumerator interface. Iterating means you've created the iterator construct in your class (demonstrated below), and you are calling foreach on your class, at which time the compilor automatically creates the enumerator functionality for you.
Also note that you don't have to do squat with your enumerator. You can call MyClass.GetEnumerator() all day long, and do nothing with it (example:
IEnumerator myEnumeratorThatIWillDoNothingWith = MyClass.GetEnumerator()).
Note too that your iterator construct in your class only realy gets used when you are actually using it, i.e. you've called foreach on your class.
Here is an iterator example from msdn:
public class DaysOfTheWeek : System.Collections.IEnumerable
{
string[] days = { "Sun", "Mon", "Tue", "Wed", "Thr", "Fri", "Sat" };
//This is the iterator!!!
public System.Collections.IEnumerator GetEnumerator()
{
for (int i = 0; i < days.Length; i++)
{
yield return days[i];
}
}
}
class TestDaysOfTheWeek
{
static void Main()
{
// Create an instance of the collection class
DaysOfTheWeek week = new DaysOfTheWeek();
// Iterate with foreach - this is using the iterator!!! When the compiler
//detects your iterator, it will automatically generate the Current,
//MoveNext and Dispose methods of the IEnumerator or IEnumerator<T> interface
foreach (string day in week)
{
System.Console.Write(day + " ");
}
}
}
// Output: Sun Mon Tue Wed Thr Fri Sat
"Iterators are a new feature in C# 2.0. An iterator is a method, get accessor or operator that enables you to support foreach iteration in a class or struct without having to implement the entire IEnumerable interface. Instead, you provide just an iterator, which simply traverses the data structures in your class. When the compiler detects your iterator, it will automatically generate the Current, MoveNext and Dispose methods of the IEnumerable or IEnumerable interface." - msdn
Enumeration deals with objects while iteration deals with values only. Enumeration is used when we use vector hashtable etc while iteration are used in while loop for loop etc. I've never use the yield keyword so I couldn't tell you.
Iteration deals with arrays and strings while enumerating deals with objects
In JavaScript you can iterate an array or a string with :
forEach Loop
for loop
for of Loop
do while Loop
while Loop
And you can enumerate an object with :
for in Loop
Object.keys() Method
Object.values() Method
Object.entries() Method