Linq Slowness on Single Call - c#

Background: my game is using a component system. I have an Entity class which has a list of IComponent instances in a List<IComponent>. My current implementation of Entity.GetComponent<T>() is:
return (T)this.components.Single(c => c is T);
After adding collision detection, I noticed my game dropped to 1FPS. Profiling revealed the culprit to be this very call (which is called 3000+ times per frame).
3000x aside, I noticed that calling this 300k times takes about 2 seconds. I optimized it to a simple iterative loop:
foreach (IComponent c in this.components) {
if (c is T) {
return (T)c;
}
}
return default(T);
This code now runs in about 0.4s, which is an order of magnitude better.
I thought Single would be much more efficient than a single foreach loop. What's going on here?

The doc for Single says:
Returns the only element of a sequence, and throws an exception if
there is not exactly one element in the sequence.
On the other hand First:
The first element in the sequence that passes the test in the
specified predicate function.
So, with Single you traverse the whole sequence without short circuiting, which is what the foreach loop above does. So, use First or FirstOrDefault instead of Single.

Single iterates through the entire collection and makes sure only one item is found. So your best performance is always O(N)
Your iterative search is also subject to O(N) performance, but that is a worst case scenario.
Source:
List<T>.Single Method

Related

Using LINQ Where result in foreach: hidden if statement, double foreach?

foreach (Person criminal in people.Where(person => person.isCriminal)
{
// do something
}
I have this piece of code and want to know how does it actually work. Is it equivalent to an if statement nested inside the foreach iteration or does it first loop through the list of people and repeats the loop with selected values? I care to know more about this from the perspective of efficiency.
foreach (Person criminal in people)
{
if (criminal.isCriminal)
{
// do something
}
}
Where uses deferred execution.
This means that the filtering does not occur immediately when you call Where. Instead, each time you call GetEnumerator().MoveNext() on the return value of Where, it checks if the next element in the sequence satisfies the condition. If it does not, it skips over this element and checks the next one. When there is an element that satisfies the condition, it stops advancing and you can get the value using Current.
Basically, it is like having an if statement inside a foreach loop.
To understand what happens, you must know how IEnumerables<T> work (because LINQ to Objects always work on IEnumerables<T>. IEnumerables<T> return an IEnumerator<T> which implements an iterator. This iterator is lazy, i.e. it always only yields one element of the sequence at once. There is no looping done in advance, unless you have an OrderBy or another command which requires it.
So if you have ...
foreach (string name in source.Where(x => x.IsChecked).Select(x => x.Name)) {
Console.WriteLine(name);
}
... this will happen: The foreach-statement requires the first item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The first name is printed to the console.
Then the foreach-statement requires the second item which is requested from the Select, which in turn requires one item from Where, which in turn retrieves one item from the source. The second name is printed to the console.
and so on.
This means that both of your code snipptes are logically equivalent.
It depends on what people is.
If people is an IEnumerable object (like a collection, or the result of a method using yield) then the two pieces of code in your question are indeed equivalent.
A naïve Where could be implemented as:
public static IEnumerable<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// Error handling left out for simplicity.
foreach (TSource item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
The actual code in Enumerable is a bit different to make sure that errors from passing a null source or predicate happen immediately rather than on the deferred execution, and to optimise for a few cases (e.g. source.Where(x => x.IsCriminal).Where(x => x.IsOnParole) is turned into the equivalent of source.Where(x => x.IsCriminal && x.IsOnParole) so that there's one fewer step in the chains of iterations), but that's the basic principle.
If however people is an IQueryable then things are different, and depend on the details of the query provider in question.
The simplest possibility is that the query provider can't do anything special with the Where and so it ends up just doing pretty much the above, because that will still work.
But often the query provider can do something else. Let's say people is a DbSet<Person> in Entity Framework assocated with a table in a database called people. If you do:
foreach(var person in people)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
And then create a Person object for each row returned. We could do the same filtering in about to implement Where but we can also do better.
If you do:
foreach (Person criminal in people.Where(person => person.isCriminal)
{
DoSomething(person);
}
Then Entity Framework will run SQL similar to:
SELECT *
FROM people
WHERE isCriminal = 1
This means that the logic of deciding which elements to return is done in the database before it comes back to .NET. It allows for indices to be used in computing the WHERE which can be much more efficient, but even in the worse case of there being no useful indices and the database having to do a full scan it will still mean that those records we don't care about are never reported back from the database and there is no object created for them just to be thrown away again, so the difference in performance can be immense.
I care to know more about this from the perspective of efficiency
You are hopefully satisfied that there's no double pass as you suggested might happen, and happy to learn that it's even more efficient than the foreach … if you suggested when possible.
A bare foreach and if will still beat .Where() against an IEnumerable (but not against a database source) as there are a few overheads to Where that foreach and if don't have, but it's to a degree that is only worth caring about in very hot paths. Generally Where can be used with reasonable confidence in its efficiency.

Will the compiler optimise a comparison against IEnumerable<T>.Count()?

As a naive tip, you often hear to use IEnumerable.Any() because then the entire enumerable does not necessarily need to be traversed.
I just wrote a little segment of code that tries to see if the Enumerable contains a single item or multiple.
if (reportInfo.OrebodyAndPits.SelectMany(ob => ob.Pits).Count() > 1)
{
ws.Cells[row, col++].Value = "Pits";
}
else
{
ws.Cells[row, col++].Value = "Pit";
}
That made me wonder, will the comparison be compiled into a form that is smart enough to return false as soon as it enumerates past the first item?
If not, is there a way to write a linq extension method that would do that?
(Please note, I'm not terribly interested in the performance impact of this piece of code. I'm mainly curious.)
No, it will not. Your code will count all the items in the sequence. This is because LINQ statements are not optimized by the compiler, what you write is what you get.
An equivelent, more efficient way of checking whether a sequence contains more than 1 item is:
reportInfo.OrebodyAndPits.SelectMany(ob => ob.Pits).Skip(1).Any();
This will check, after skipping the first item, whether there are any items left.
If you want to know how something works why no look at the source code?
Here's the Any() method: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/AnyAll.cs#L20
Here is the Count() method: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Count.cs#L12
The compiler cannot make an optimisation like you describe. It asks for the count and gets a number then it compares that number with what's in your conditional statement.
It does however try and make some sort of optimisation. As you can see from the Count() method it attempts to see if the IEnumerable already supports a Count property and uses that because it is faster than counting all the elements again. If not available it has to move through the entire thing and count each individually.
If you want to write a LINQ method (which is just an extension method on IEnumerable<T>) that determines if there are at least two in an IEnumerable then that should be easy enough. Something like this:
e.g.
public static bool AtLeastTwo<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull(nameof(source));
}
using (IEnumerator<TSource> e = source.GetEnumerator())
{
e.MoveNext(); // Move past the first one
return e.MoveNext(); // true if there is at least a second element.
}
}

Does foreach loop work more slowly when used with a not stored list or array?

I am wondered at if foreach loop works slowly if an unstored list or array is used as an in array or List.
I mean like that:
foreach (int number in list.OrderBy(x => x.Value)
{
// DoSomething();
}
Does the loop in this code calculates the sorting every iteration or not?
The loop using stored value:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>;
foreach (int number in list)
{
// DoSomething();
}
And if it does, which code shows the better performance, storing the value or not?
This is often counter-intuitive, but generally speaking, the option that is best for performance is to wait as long as possible to materialize results into a concrete structure like a list or array. Please keep in mind that this is a generalization, and so there are plenty of cases where it doesn't hold. Nevertheless, the first instinct is better when you avoid creating the list for as long as possible.
To demonstrate with your sample, we have these two options:
var list = tours.OrderBy(x => x.Value).ToList();
foreach (int number in list)
{
// DoSomething();
}
vs this option:
foreach (int number in list.OrderBy(x => x.Value))
{
// DoSomething();
}
To understand what is going on here, you need to look at the .OrderBy() extension method. Reading the linked documentation, you'll see it returns a IOrderedEnumerable<TSource> object. With an IOrderedEnumerable, all of the sorting needed for the foreach loop is already finished when you first start iterating over the object (and that, I believe, is the crux of your question: No, it does not re-sort on each iteration). Also note that both samples use the same OrderBy() call. Therefore, both samples have the same problem to solve for ordering the results, and they accomplish it the same way, meaning they take exactly the same amount of time to reach that point in the code.
The difference in the code samples, then, is entirely in using the foreach loop directly vs first calling .ToList(), because in both cases we start from an IOrderedEnumerable. Let's look closely at those differences.
When you call .ToList(), what do you think happens? This method is not magic. There is still code here which must execute in order to produce the list. This code still effectively uses it's own foreach loop that you can't see. Additionally, where once you only needed to worry about enough RAM to handle one object at a time, you are now forcing your program to allocate a new block of RAM large enough to hold references for the entire collection. Moving beyond references, you may also potentially need to create new memory allocations for the full objects, if you were reading a from a stream or database reader before that really only needed one object in RAM at a time. This is an especially big deal on systems where memory is the primary constraint, which is often the case with web servers, where you may be serving and maintaining session RAM for many many sessions, but each session only occasionally uses any CPU time to request a new page.
Now I am making one assumption here, that you are working with something that is not already a list. What I mean by this, is the previous paragraphs talked about needing to convert an IOrderedEnumerable into a List, but not about converting a List into some form of IEnumerable. I need to admit that there is some small overhead in creating and operating the state machine that .Net uses to implement those objects. However, I think this is a good assumption. It turns out to be true far more often than we realize. Even in the samples for this question, we're paying this cost regardless, by the simple virtual of calling the OrderBy() function.
In summary, there can be some additional overhead in using a raw IEnumerable vs converting to a List, but there probably isn't. Additionally, you are almost certainly saving yourself some RAM by avoiding the conversions to List whenever possible... potentially a lot of RAM.
Yes and no.
Yes the foreach statement will seem to work slower.
No your program has the same total amount of work to do so you will not be able to measure a difference from the outside.
What you need to focus on is not using a lazy operation (in this case OrderBy) multiple times without a .ToList or ToArray. In this case you are only using it once(foreach) but it is an easy thing to miss.
Edit: Just to be clear. The as statement in the question will not work as intended but my answer assumes no .ToList() after OrderBy .
This line won't run:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>; // Returns null.
Instead, you want to store the results this way:
List<Tour> list = tours.OrderBy(x => x.Value).ToList();
And yes, the second option (storing the results) will enumerate much faster as it will skip the sorting operation.

Why Single(IEnumerable<T>,Predicate<T>) is so inefficient [duplicate]

I came across this implementation in Enumerable.cs by reflector.
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//check parameters
TSource local = default(TSource);
long num = 0L;
foreach (TSource local2 in source)
{
if (predicate(local2))
{
local = local2;
num += 1L;
//I think they should do something here like:
//if (num >= 2L) throw Error.MoreThanOneMatch();
//no necessary to continue
}
}
//return different results by num's value
}
I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:
class DataItem
{
private int _num;
public DataItem(int num)
{
_num = num;
}
public int Num
{
get{ Console.WriteLine("getting "+_num); return _num;}
}
}
var source = Enumerable.Range(1,10).Select( x => new DataItem(x));
var result = source.Single(x => x.Num < 5);
For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception.
Is there any algorithmic reason they implement this method like this?
EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.
Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.
In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.
Code that uses Single where zero or multiple matches is a perfectly valid possibility, such as:
try
{
var item = source.Single(predicate);
DoSomething(item);
}
catch(InvalidOperationException)
{
DoSomethingElseUnexceptional();
}
should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):
var firstTwo = source.Where(predicate).Take(2).ToArray();
if(firstTwo.Length == 1)
{
// Note that this won't fail. If it does, this code has a bug.
DoSomething(firstTwo.Single());
}
else
{
DoSomethingElseUnexceptional();
}
In other words, we should leave the use of Single to cases when we expect the sequence to contain only one match. It should behave identically to Firstbut with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).
This leaves us with two cases:
The assertion holds: There is a single match. In this case, we want Single to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.
The assertion fails: There are zero or multiple matches. In this case, we do throw later than we could, but this isn't such a big deal since the exception is boneheaded: it is indicative of a bug that must be fixed.
To sum up, if the 'poor implementation' is biting you performance-wise in production, either:
You are using Single incorrectly.
You have a bug in your program. Once the bug is fixed, this particular performance problem will go away.
EDIT: Clarified my point.
EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):
public static User GetUserById(this IEnumerable<User> users, string id)
{
if(users == null)
throw new ArgumentNullException("users");
// Perfectly fine if documented that a failure in the query
// is treated as an exceptional circumstance. Caller's job
// to guarantee pre-condition.
return users.Single(user => user.Id == id);
}
Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.
New answer:
Let's assume that Single should throw an exception when it's pre-condition is not met; that is, when Single detects than either none, or more than one item in the collection matches the predicate.
Single can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.
This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single's pre-condition cannot be met and when it will throw an exception.
As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.
Thus, it is actually correct that Single could throw an exception early; but it doesn't have to, because there's practically no added benefit.
Old answer:
I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:
My understanding of Single:
What is the purpose of Single, and how is it different from e.g. First or Last?
Using the Single operator basically expresses one's assumption that exactly one item must be returned from the collection:
If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.
If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single().)
This is what makes Single different from other operators such as First, Last, or Take(1). None of those operators have the requirement that there should be exactly one (matching) item.
When should Single throw an exception?
Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.
When should Single be used?
The use of Single is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.
If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single. Single, together with an exception catch block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single, you should already have made sure that there'll be only one matching item.
Conclusion:
The above states my understanding of the Single LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single is violated as soon as it detects a second (matching) item in the collection.
When considering this implementation we must remember that this is the BCL: general code that is supposed to work good enough in all sorts of scenarios.
First, take these scenarios:
Iterate over 10 numbers, where the first and second elements are equal
Iterate over 1.000.000 numbers, where the first and third elements are equal
The original algorithm will work well enough for 10 items, but 1M will have a severe waste of cycles. So in these cases where we know that there are two or more early in the sequences, the proposed optimization would have a nice effect.
Then, look at these scenarios:
Iterate over 10 numbers, where the first and last elements are equal
Iterate over 1.000.000 numbers, where the first and last elements are equal
In these scenarios the algorithm is still required to inspect every item in the lists. There is no shortcut. The original algorithm will perform good enough, it fulfills the contract. Changing the algorithm, introducing an if on each iteration will actually decrease performance. For 10 items it will be negligible, but 1M it will be a big hit.
IMO, the original implementation is the correct one, since it is good enough for most scenarios. Knowing the implementation of Single is good though, because it enables us to make smart decisions based on what we know about the sequences we use it on. If performance measurements in one particular scenario shows that Single is causing a bottleneck, well: then we can implement our own variant that works better in that particular scenario.
Update: as CodeInChaos and Eamon have correctly pointed out, the if test introduced in the optimization is indeed not performed on each item, only within the predicate match block. I have in my example completely overlooked the fact that the proposed changes will not affect the overall performance of the implementation.
I agree that introducing the optimization would probably benefit all scenarios. It is good to see though that eventually, the decision to implement the optimization is made on the basis of performance measurements.
I think it's a premature optimization "bug".
Why this is NOT reasonable behavior due to side effects
Some have argued that due to side effects, it should be expected that the entire list is evaluated. After all, in the correct case (the sequence indeed has just 1 element) it is completely enumerated, and for consistency with this normal case it's nicer to enumerate the entire sequence in all cases.
Although that's a reasonable argument, it flies in the face of the general practice throughout the LINQ libraries: they use lazy evaluation everywhere. It's not general practice to fully enumerate sequences except where absolutely necessary; indeed, several methods prefer using IList.Count when available over any iteration at all - even when that iteration may have side effects.
Further, .Single() without predicate does not exhibit this behavior: that terminates as soon as possible. If the argument were that .Single() should respect side-effects of enumeration, you'd expect all overloads to do so equivalently.
Why the case for speed doesn't hold
Peter Lillevold made the interesting observation that it may be faster to do...
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
}
if(count!=1)...
than
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
if(count>1) ...
}
if(count==0)...
After all, the second version, which would exit the iteration as soon as the first conflict is detected, would require an extra test in the loop - a test which in the "correct" is purely ballast. Neat theory, right?
Except, that's not bourne out by the numbers; for example on my machine (YMMV) Enumerable.Range(0,100000000).Where(x=>x==123).Single() is actually faster than Enumerable.Range(0,100000000).Single(x=>x==123)!
It's possibly a JITter quirk of this precise expression on this machine - I'm not claiming that Where followed by predicateless Single is always faster.
But whatever the case, the fail-fast solution is very unlikely to be significantly slower. After all, even in the normal case, we're dealing with a cheap branch: a branch that is never taken and thus easy on the branch predictor. And of course; the branch is further only ever encountered when pred holds - that's once per call in the normal case. That cost is simply negligible compared to the cost of the delegate call pred and its implementation, plus the cost of the interface methods .MoveNext() and .get_Current() and their implementations.
It's simply extremely unlikely that you'll notice the performance degradation caused by one predictable branch in comparison to all that other abstraction penalty - not to mention the fact that most sequences and predicates actually do something themselves.
It seems very clear to me.
Single is intended for the case where the caller knows that the enumeration contains exactly one match, since in any other case an expensive exception is thrown.
For this use case, the overload that takes a predicate must iterate over the whole enumeration. It is slightly faster to do so without an additional condition on every loop.
In my view the current implementation is correct: it is optimized for the expected use case of an enumeration that contains exactly one matching element.
That does appear to be a bad implementation, in my opinion.
Just to illustrate the potential severity of the problem:
var oneMillion = Enumerable.Range(1, 1000000)
.Select(x => { Console.WriteLine(x); return x; });
int firstEven = oneMillion.Single(x => x % 2 == 0);
The above will output all the integers from 1 to 1000000 before throwing an exception.
It's a head-scratcher for sure.
I only found this question after filing a report at https://connect.microsoft.com/VisualStudio/feedback/details/810457/public-static-tsource-single-tsource-this-ienumerable-tsource-source-func-tsource-bool-predicate-doesnt-throw-immediately-on-second-matching-result#
The side-effect argument doesn't hold water, because:
Having side-effects isn't really functional, and they're called Func for a reason.
If you do want side-effects, it makes no more sense to claim the version that has the side-effects throughout the whole sequence is desirable than it does to claim so for the version that throws immediately.
It does not match the behaviour of First or the other overload of Single.
It does not match at least some other implementations of Single, e.g. Linq2SQL uses TOP 2 to ensure that only the two matching cases needed to test for more than one match are returned.
We can construct cases where we should expect a program to halt, but it does not halt.
We can construct cases where OverflowException is thrown, which is not documented behaviour, and hence clearly a bug.
Most importantly of all, if we're in a condition where we expected the sequence to have only one matching element, and yet we're not, then something has clearly gone wrong. Apart from the general principle that the only thing you should do upon detecting an error state is clean-up (and this implementation delays that) before throwing, the case of an sequence having more than one matching element is going to overlap with the case of a sequence having more elements in total than expected - perhaps because the sequence has a bug that is causing it to loop unexpectedly. So it's precisely in one possible set of bugs that should trigger the exception, that the exception is most delayed.
Edit:
Peter Lillevold's mention of a repeated test may be a reason why the author chose to take the approach they did, as an optimisation for the non-exceptional case. If so it was needless though, even aside from Eamon Nerbonne showing it wouldn't improve much. There's no need to have a repeated test in the initial loop, as we can just change what we're testing for upon the first match:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if(source == null)
throw new ArgumentNullException("source");
if(predicate == null)
throw new ArgumentNullException("predicate");
using(IEnumerator<TSource> en = source.GetEnumerator())
{
while(en.MoveNext())
{
TSource cur = en.Current;
if(predicate(cur))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("Sequence contains more than one matching element");
return cur;
}
}
}
throw new InvalidOperationException("Sequence contains no matching element");
}

Is it OK to reuse IEnumerable collections more than once?

Basically I am wondering if it's ok to use an enumeration more than once in code subsequently. Whether you break early or not, would the enumeration always reset in every foreach case below, so giving us consistent enumeration from start to end of the effects collection?
var effects = this.EffectsRecursive;
foreach (Effect effect in effects)
{
...
}
foreach (Effect effect in effects)
{
if(effect.Name = "Pixelate")
break;
}
foreach (Effect effect in effects)
{
...
}
EDIT: The implementation of EffectsRecursive is this:
public IEnumerable<Effect> Effects
{
get
{
for ( int i = 0; i < this.IEffect.NumberOfChildren; ++i )
{
IEffect effect = this.IEffect.GetChildEffect ( i );
if ( effect != null )
yield return new Effect ( effect );
}
}
}
public IEnumerable<Effect> EffectsRecursive
{
get
{
foreach ( Effect effect in this.Effects )
{
yield return effect;
foreach ( Effect seffect in effect.ChildrenRecursive )
yield return seffect;
}
}
}
Yes this is legal to do. The IEnumerable<T> pattern is meant to support multiple enumerations on a single source. Collections which can only be enumerated once should expose IEnumerator instead.
The code that consumes the sequence is fine. As spender points out, the code that produces the enumeration might have performance problems if the tree is deep.
Suppose at the deepest point your tree is four deep; think about what happens on the nodes that are four deep. To get that node, you iterate the root, which calls an iterator, which calls an iterator, which calls an iterator, which passes the node back to code that passes the node back to code that passes the node back... Instead of just handing the node to the caller, you've made a little bucket brigade with four guys in it, and they're tramping the data around from object to object before it finally gets to the loop that wanted it.
If the tree is only four deep, no big deal probably. But suppose the tree is ten thousand elements, and has a thousand nodes forming a linked list at the top and the remaining nine thousand nodes on the bottom. Now when you iterate those nine thousand nodes each one has to pass through a thousand iterators, for a total of nine million copies to fetch nine thousand nodes. (Of course, you've probably gotten a stack overflow error and crashed the process as well.)
The way to deal with this problem if you have it is to manage the stack yourself rather than pushing new iterators on the stack.
public IEnumerable<Effect> EffectsNotRecursive()
{
var stack = new Stack<Effect>();
stack.Push(this);
while(stack.Count != 0)
{
var current = stack.Pop();
yield return current;
foreach(var child in current.Effects)
stack.Push(child);
}
}
The original implementation has a time complexity of O(nd) where n is the number of nodes and d is the average depth of the tree; since d can in the worst case be O(n), and in the best case be O(lg n), that means that the algorithm is between O(n lg n) and O(n^2) in time. It is O(d) in heap space (for all the iterators) and O(d) in stack space (for all the recursive calls.)
The new implementation has a time complexity of O(n), and is O(d) in heap space, and O(1) in stack space.
One down side of this is that the order is different; the tree is traversed from top to bottom and right to left in the new algorithm, instead of top to bottom and left to right. If that bothers you then you can just say
foreach(var child in current.Effects.Reverse())
instead.
For more analysis of this problem, see my colleague Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
Legal, yes. Whether it will function as you expect depends on:
The implementation of the IEnumerable returned by EffectsRecursive and whether it always returns the same set;
Whether you want to enumerate the same set both times
If it returns an IEnumerable that requires some intensive work, and it doesn't cache the results internally, then you may need to .ToList() it yourself. If it does cache the results, then ToList() would be slightly redundant but probably no harm.
Also, if GetEnumerator() is implemented in a typical/proper (*) way, then you can safely enumerate any number of times - each foreach will be a new call to GetEnumerator() which returns a new instance of the IEnumerator. But it could be that in some situations it returns the same IEnumerator instance that's already been partially or fully enumerated, so it all really just depends on the specific intended usage of that particular IEnumerable.
*I'm pretty sure returning the same enumerator multiple times is actually a violation of the implied contract for the pattern, but I have seen some implementations that do it anyway.
Most probably, yes. Most implementations of IEnumerable return a fresh IEnumerator which starts at the beginning of the list.
It all depends on the implementation of the type of EffectsRecursive.

Categories