Is it OK to reuse IEnumerable collections more than once? - c#

Basically I am wondering if it's ok to use an enumeration more than once in code subsequently. Whether you break early or not, would the enumeration always reset in every foreach case below, so giving us consistent enumeration from start to end of the effects collection?
var effects = this.EffectsRecursive;
foreach (Effect effect in effects)
{
...
}
foreach (Effect effect in effects)
{
if(effect.Name = "Pixelate")
break;
}
foreach (Effect effect in effects)
{
...
}
EDIT: The implementation of EffectsRecursive is this:
public IEnumerable<Effect> Effects
{
get
{
for ( int i = 0; i < this.IEffect.NumberOfChildren; ++i )
{
IEffect effect = this.IEffect.GetChildEffect ( i );
if ( effect != null )
yield return new Effect ( effect );
}
}
}
public IEnumerable<Effect> EffectsRecursive
{
get
{
foreach ( Effect effect in this.Effects )
{
yield return effect;
foreach ( Effect seffect in effect.ChildrenRecursive )
yield return seffect;
}
}
}

Yes this is legal to do. The IEnumerable<T> pattern is meant to support multiple enumerations on a single source. Collections which can only be enumerated once should expose IEnumerator instead.

The code that consumes the sequence is fine. As spender points out, the code that produces the enumeration might have performance problems if the tree is deep.
Suppose at the deepest point your tree is four deep; think about what happens on the nodes that are four deep. To get that node, you iterate the root, which calls an iterator, which calls an iterator, which calls an iterator, which passes the node back to code that passes the node back to code that passes the node back... Instead of just handing the node to the caller, you've made a little bucket brigade with four guys in it, and they're tramping the data around from object to object before it finally gets to the loop that wanted it.
If the tree is only four deep, no big deal probably. But suppose the tree is ten thousand elements, and has a thousand nodes forming a linked list at the top and the remaining nine thousand nodes on the bottom. Now when you iterate those nine thousand nodes each one has to pass through a thousand iterators, for a total of nine million copies to fetch nine thousand nodes. (Of course, you've probably gotten a stack overflow error and crashed the process as well.)
The way to deal with this problem if you have it is to manage the stack yourself rather than pushing new iterators on the stack.
public IEnumerable<Effect> EffectsNotRecursive()
{
var stack = new Stack<Effect>();
stack.Push(this);
while(stack.Count != 0)
{
var current = stack.Pop();
yield return current;
foreach(var child in current.Effects)
stack.Push(child);
}
}
The original implementation has a time complexity of O(nd) where n is the number of nodes and d is the average depth of the tree; since d can in the worst case be O(n), and in the best case be O(lg n), that means that the algorithm is between O(n lg n) and O(n^2) in time. It is O(d) in heap space (for all the iterators) and O(d) in stack space (for all the recursive calls.)
The new implementation has a time complexity of O(n), and is O(d) in heap space, and O(1) in stack space.
One down side of this is that the order is different; the tree is traversed from top to bottom and right to left in the new algorithm, instead of top to bottom and left to right. If that bothers you then you can just say
foreach(var child in current.Effects.Reverse())
instead.
For more analysis of this problem, see my colleague Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

Legal, yes. Whether it will function as you expect depends on:
The implementation of the IEnumerable returned by EffectsRecursive and whether it always returns the same set;
Whether you want to enumerate the same set both times
If it returns an IEnumerable that requires some intensive work, and it doesn't cache the results internally, then you may need to .ToList() it yourself. If it does cache the results, then ToList() would be slightly redundant but probably no harm.
Also, if GetEnumerator() is implemented in a typical/proper (*) way, then you can safely enumerate any number of times - each foreach will be a new call to GetEnumerator() which returns a new instance of the IEnumerator. But it could be that in some situations it returns the same IEnumerator instance that's already been partially or fully enumerated, so it all really just depends on the specific intended usage of that particular IEnumerable.
*I'm pretty sure returning the same enumerator multiple times is actually a violation of the implied contract for the pattern, but I have seen some implementations that do it anyway.

Most probably, yes. Most implementations of IEnumerable return a fresh IEnumerator which starts at the beginning of the list.

It all depends on the implementation of the type of EffectsRecursive.

Related

How to remove from List<T> efficiently (C#)?

If I understood correctly (and please correct me if i'm wrong), list is implemented by array in .NET, which means that every deletion of an item in the list will cause re-allocation of all the list (which in turn means O(n)).
I'm developing a game, in the game i have many bullets fly in the air on any giving moment, let's say 100 bullets, each frame I move them by few pixels and check for collision with objects in the game, I need to remove from the list every bullet that collided.
So I collect the collided bullet in another temporary list and then do the following:
foreach (Bullet bullet in bulletsForDeletion)
mBullets.Remove(bullet);
Because the loop is O(n) and the remove is O(n), I spend O(n^2) time to remove.
Is there a better way to remove it, or more suitable collection to use?
Sets and linked lists both have constant time removal. Can you use either of those data structures?
There's no way to avoid the O(N) cost of removal from List<T>. You'll need to use a different data structure if this is a problem for you. It may make the code which calculates bulletsToRemove feel nicer too.
ISet<T> has nice methods for calculating differences and intersections between sets of objects.
You lose ordering by using sets, but given you are taking bullets, I'm guessing that is not an issue. You can still enumerate it in constant time.
In your case, you might write:
mBullets.ExceptWith(bulletsForDeletion);
Create a new list:
var newList = `oldList.Except(deleteItems).ToList()`.
Try to use functional idioms wherever possible. Don't modify existing data structures, create new ones.
This algorithm is O(N) thanks to hashing.
Can't you just switch from List (equivalent of Java's ArrayList to highlight that) to LinkedList? LinkedList takes O(1) to delete a specific element, but O(n) to delete by index
How a list is implemented internally is not something you should be thinking about. You should be interacting with the list in that abstraction level.
If you do have performance problems and you pinpoint them to the list, then it is time to look at how it is implemented.
As for removing items from a list - you can use mBullets.RemoveAll(predicate), where predicate is an expression that identifies items that have collided.
RemoveAt(int index) is faster than Remove(T item) because the later use the first inside it, doing reflection, this is the code inside each function.
Also, Remove(T) has the function IndexOf, which has inside it a second loop to evalute the index of each item.
public bool Remove(T item)
{
int index = this.IndexOf(item);
if (index < 0)
return false;
this.RemoveAt(index);
return true;
}
public void RemoveAt(int index)
{
if ((uint) index >= (uint) this._size)
ThrowHelper.ThrowArgumentOutOfRangeException();
--this._size;
if (index < this._size)
Array.Copy((Array) this._items, index + 1, (Array) this._items, index, this._size - index);
this._items[this._size] = default (T);
++this._version;
}
I would do a loop like this:
for (int i=MyList.Count-1; i>=0; i--)
{
// add some code here
if (needtodelete == true)
MyList.RemoveAt(i);
}
In the spirit of functional idiom suggested by "usr", you might consider not removing from your list at all. Instead, have your update routine take a list and return a list. The list returned contains only the "still-alive" objects. You can then swap lists at the end of the game loop (or immediately, if appropriate). I've done this myself.
Ideally, it shouldn't be important to think about how C# implements its List methods, and somewhat unfortunate that it reassigns all the array elements after index i when doing somelist.RemoveAt(i). Sometimes optimizations around standard library implementations are necessary, though.
If you find your execution is expending unacceptable effort shuffling list elements around when removing them from a list with RemoveAt, one approach that might work for you is to swap it with the end, and then RemoveAt off the end of the list:
if (i < mylist.Count-1) {
mylist[i] = mylist[mylist.Count-1];
}
mylist.RemoveAt(mylist.Count-1);
This is of course an O(1) operation.

Linq Slowness on Single Call

Background: my game is using a component system. I have an Entity class which has a list of IComponent instances in a List<IComponent>. My current implementation of Entity.GetComponent<T>() is:
return (T)this.components.Single(c => c is T);
After adding collision detection, I noticed my game dropped to 1FPS. Profiling revealed the culprit to be this very call (which is called 3000+ times per frame).
3000x aside, I noticed that calling this 300k times takes about 2 seconds. I optimized it to a simple iterative loop:
foreach (IComponent c in this.components) {
if (c is T) {
return (T)c;
}
}
return default(T);
This code now runs in about 0.4s, which is an order of magnitude better.
I thought Single would be much more efficient than a single foreach loop. What's going on here?
The doc for Single says:
Returns the only element of a sequence, and throws an exception if
there is not exactly one element in the sequence.
On the other hand First:
The first element in the sequence that passes the test in the
specified predicate function.
So, with Single you traverse the whole sequence without short circuiting, which is what the foreach loop above does. So, use First or FirstOrDefault instead of Single.
Single iterates through the entire collection and makes sure only one item is found. So your best performance is always O(N)
Your iterative search is also subject to O(N) performance, but that is a worst case scenario.
Source:
List<T>.Single Method

When NOT to use yield (return) [duplicate]

This question already has answers here:
Closed 12 years ago.
This question already has an answer here:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?
There are several useful questions here on SO about the benefits of yield return. For example,
Can someone demystify the yield
keyword
Interesting use of the c# yield
keyword
What is the yield keyword
I'm looking for thoughts on when NOT to use yield return. For example, if I expect to need to return all items in a collection, it doesn't seem like yield would be useful, right?
What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:
public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
if (root == null) yield break;
yield return root.Value;
foreach(T item in PreorderTraversal(root.Left))
yield return item;
foreach(T item in PreorderTraversal(root.Right))
yield return item;
}
Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)
But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:
public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
var stack = new Stack<Tree<T>>();
stack.Push(root);
while (stack.Count != 0)
{
var current = stack.Pop();
if (current == null) continue;
yield return current.Value;
stack.Push(current.Left);
stack.Push(current.Right);
}
}
which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.
Further reading: see Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
What are the cases where use of yield
will be limiting, unnecessary, get me
into trouble, or otherwise should be
avoided?
I can think of a couple of cases, IE:
Avoid using yield return when you return an existing iterator. Example:
// Don't do this, it creates overhead for no reason
// (a new state machine needs to be generated)
public IEnumerable<string> GetKeys()
{
foreach(string key in _someDictionary.Keys)
yield return key;
}
// DO this
public IEnumerable<string> GetKeys()
{
return _someDictionary.Keys;
}
Avoid using yield return when you don't want to defer execution code for the method. Example:
// Don't do this, the exception won't get thrown until the iterator is
// iterated, which can be very far away from this method invocation
public IEnumerable<string> Foo(Bar baz)
{
if (baz == null)
throw new ArgumentNullException();
yield ...
}
// DO this
public IEnumerable<string> Foo(Bar baz)
{
if (baz == null)
throw new ArgumentNullException();
return new BazIterator(baz);
}
The key thing to realize is what yield is useful for, then you can decide which cases do not benefit from it.
In other words, when you do not need a sequence to be lazily evaluated you can skip the use of yield. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to use yield to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.
Notice how most LINQ statements return an IEnumerable<T>. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting a ToList() call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing the IEnumerable<T> till needed.
There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:
IEnumerable<UserRight> GetSuperUserRights() {
if(SuperUsersAllowed) {
yield return UserRight.Add;
yield return UserRight.Edit;
yield return UserRight.Remove;
}
}
In these cases the creation of the Enumerator object is more expensive, and more verbose, than just generating a data structure.
IEnumerable<UserRight> GetSuperUserRights() {
return SuperUsersAllowed
? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
: Enumerable.Empty<UserRight>();
}
Update
Here's the results of my benchmark:
These results show how long it took (in milliseconds) to perform the operation 1,000,000 times. Smaller numbers are better.
In revisiting this, the performance difference isn't significant enough to worry about, so you should go with whatever is the easiest to read and maintain.
Update 2
I'm pretty sure the above results were achieved with compiler optimization disabled. Running in Release mode with a modern compiler, it appears performance is practically indistinguishable between the two. Go with whatever is most readable to you.
Eric Lippert raises a good point (too bad C# doesn't have stream flattening like Cw). I would add that sometimes the enumeration process is expensive for other reasons, and therefore you should use a list if you intend to iterate over the IEnumerable more than once.
For example, LINQ-to-objects is built on "yield return". If you've written a slow LINQ query (e.g. that filters a large list into a small list, or that does sorting and grouping), it may be wise to call ToList() on the result of the query in order to avoid enumerating multiple times (which actually executes the query multiple times).
If you are choosing between "yield return" and List<T> when writing a method, consider: is each single element expensive to compute, and will the caller need to enumerate the results more than once? If you know the answers are yes and yes, you shouldn't use yield return (unless, for example, the List produced is very large and you can't afford the memory it would use. Remember, another benefit of yield is that the result list doesn't have to be entirely in memory at once).
Another reason not to use "yield return" is if interleaving operations is dangerous. For example, if your method looks something like this,
IEnumerable<T> GetMyStuff() {
foreach (var x in MyCollection)
if (...)
yield return (...);
}
this is dangerous if there is a chance that MyCollection will change because of something the caller does:
foreach(T x in GetMyStuff()) {
if (...)
MyCollection.Add(...);
// Oops, now GetMyStuff() will throw an exception
// because MyCollection was modified.
}
yield return can cause trouble whenever the caller changes something that the yielding function assumes does not change.
I would avoid using yield return if the method has a side effect that you expect on calling the method. This is due to the deferred execution that Pop Catalin mentions.
One side effect could be modifying the system, which could happen in a method like IEnumerable<Foo> SetAllFoosToCompleteAndGetAllFoos(), which breaks the single responsibility principle. That's pretty obvious (now...), but a not so obvious side effect could be setting a cached result or similar as an optimisation.
My rules of thumb (again, now...) are:
Only use yield if the object being returned requires a bit of processing
No side effects in the method if I need to use yield
If have to have side effects (and limiting that to caching etc), don't use yield and make sure the benefits of expanding the iteration outweigh the costs
Yield would be limiting/unnecessary when you need random access. If you need to access element 0 then element 99, you've pretty much eliminated the usefulness of lazy evaluation.
One that might catch you out is if you are serialising the results of an enumeration and sending them over the wire. Because the execution is deferred until the results are needed, you will serialise an empty enumeration and send that back instead of the results you want.
I have to maintain a pile of code from a guy who was absolutely obsessed with yield return and IEnumerable. The problem is that a lot of third party APIs we use, as well as a lot of our own code, depend on Lists or Arrays. So I end up having to do:
IEnumerable<foo> myFoos = getSomeFoos();
List<foo> fooList = new List<foo>(myFoos);
thirdPartyApi.DoStuffWithArray(fooList.ToArray());
Not necessarily bad, but kind of annoying to deal with, and on a few occasions it's led to creating duplicate Lists in memory to avoid refactoring everything.
When you don't want a code block to return an iterator for sequential access to an underlying collection, you dont need yield return. You simply return the collection then.
If you're defining a Linq-y extension method where you're wrapping actual Linq members, those members will more often than not return an iterator. Yielding through that iterator yourself is unnecessary.
Beyond that, you can't really get into much trouble using yield to define a "streaming" enumerable that is evaluated on a JIT basis.

Another Quicksort stackoverflow

So I've been trying to implement a quicksort myself, just to learn something from it, but it also generates a stackoverflowexception, but I can't seem to find what the cause is.
Can someone give me a clue?
public void Partition(List<int> valuelist, out List<int> greater, out List<int> lesser)
{
lesser = new List<int>(); // <-- Stackoverflow exception here!
greater = new List<int>();
if (valuelist.Count <= 1)
return;
pivot = valuelist.First();
foreach (int Element in valuelist)
{
if (Element <= pivot)
lesser.Add(Element);
else
greater.Add(Element);
}
}
public List<int> DoQuickSort(List<int> list)
{
List<int> great;
List<int> less;
Partition(list, out great, out less);
DoQuickSort(great);
DoQuickSort(less);
list.Clear();
list = (List<int>)less.Concat(great);
return list;
}
you are doing an infinite loop right there
DoQuickSort(great);
you need a way to get out of that loop with a flag or a condition
Edit
I will add that in debugging mode, with default setting, you can only reach between 10,000 and 16,000 recursive call before an exception is thrown and between 50,000 and 80,000 when in release mode, all depend on the actual code executed.
If you play with a huge number of values, you might need to manage yourself that recursive call by using a Stack object.
sample code to see how much call before it crash;
(debug; 14,210 call, release 80,071 call)
static int s = 1;
static void Main(string[] args)
{
o();
}
static void o()
{
s++;
Console.WriteLine(s.ToString());
o();
}
You're not putting any conditions on your recursive calls to DoQuicksort, so it'll never stop recursing, leading to a stack overflow. You should only be calling DoQuicksort on a list if it contains more than one element.
Edit: As Will said in his comment, this is a very slow approach to "Quicksort". You should look at in-place partitioning algorithms, as mentioned on Wikipedia's Quicksort article.
I think one of the problems in your code that you keep the pivot value when partitioning the list. This means that you will run into a situation where all values partition into either greater or less, and the partitioning will stop working. This will effectively not letting you split one of the lists anylonger, so the exit condition in the Partition method is never satisfied.
You should select a pivot value, remove the pivot element from the list (this part is missing in your code), parition it in greater and less lists, sort those (recursively), and then concatenate the less list, the pivot element (this is also, naturally, missing in your code) and the greater list.
I can post an updated, working, code sample, but since you are in "learning mode", I will keep it to myself until you ask for it :)

C#: Avoid infinite recursion when traversing object graph

I have an object graph wherein each child object contains a property that refers back to its parent. Are there any good strategies for ignoring the parent references in order to avoid infinite recursion? I have considered adding a special [Parent] attribute to these properties or using a special naming convention, but perhaps there is a better way.
If the loops can be generalised (you can have any number of elements making up the loop), you can keep track of objects you've seen already in a HashSet and stop if the object is already in the set when you visit it. Or add a flag to the objects which you set when you visit it (but you then have to go back & unset all the flags when you're done, and the graph can only be traversed by a single thread at a time).
Alternatively, if the loops will only be back to the parent, you can keep a reference to the parent and not loop on properties that refer back to it.
For simplicity, if you know the parent reference will have a certain name, you could just not loop on that property :)
What a coincidence; this is the topic of my blog this coming Monday. See it for more details. Until then, here's some code to give you an idea of how to do this:
static IEnumerable<T> Traversal<T>(
T item,
Func<T, IEnumerable<T>> children)
{
var seen = new HashSet<T>();
var stack = new Stack<T>();
seen.Add(item);
stack.Push(item);
yield return item;
while(stack.Count > 0)
{
T current = stack.Pop();
foreach(T newItem in children(current))
{
if (!seen.Contains(newItem))
{
seen.Add(newItem);
stack.Push(newItem);
yield return newItem;
}
}
}
}
The method takes two things: an item, and a relation that produces the set of everything that is adjacent to the item. It produces a depth-first traversal of the transitive and reflexive closure of the adjacency relation on the item. Let the number of items in the graph be n, and the maximum depth be 1 <= d <= n, assuming the branching factor is not bounded. This algorithm uses an explicit stack rather than recursion because (1) recursion in this case turns what should be an O(n) algorithm into O(nd), which is then something between O(n) and O(n^2), and (2) excessive recursion can blow the stack if the d is more than a few hundred nodes.
Note that the peak memory usage of this algorithm is of course O(n + d) = O(n).
So, for example:
foreach(Node node in Traversal(myGraph.Root, n => n.Children))
Console.WriteLine(node.Name);
Make sense?
If you're doing a graph traversal, you can have a "visited" flag on each node. This ensures that you don't revisit a node and possibly get stuck in an infinite loop. I believe this is the standard way of performing a graph traversal.
This is a common problem, but the best approach depends on the scenario. An additional problem is that in many cases it isn't a problem visiting the same object twice - that doesn't imply recursion - for example, consider the tree:
A
=> B
=> C
=> D
=> C
This may be valid (think XmlSerializer, which would simply write the C instance out twice), so it is often necessary to push/pop objects on a stack to check for true recursion. The last time I implemented a "visitor", I kept a "depth" counter, and only enabled the stack checking beyond a certain threshold - that means that most trees simply end up doing some ++/--, but nothing more expensive. You can see the approach I took here.
I'm not exactly sure what you are trying to do here but you could just maintain a hashtable with all previously visited nodes when you are doing your breadth first search of depth first search.
I published a post explaining in detail with code examples how to do object traversal by recursive reflection and also detect and avoid recursive references to prevent a stack over flow exception: https://doguarslan.wordpress.com/2016/10/03/object-graph-traversal-by-recursive-reflection/
In that example I did a depth first traversal using recursive reflection and I maintained a HashSet of visited nodes for reference types. One thing to be careful is to initialize your HashSet with your custom equality comparer which uses the object reference for hash calculation, basically the GetHashCode() method implemented by the base object class itself and not any overloaded versions of GetHashCode() because if the types of properties you traverse overload GetHashCode method, you may detect false hash collisions and think that you detected a recursive reference which in reality could be that the overloaded version of GetHashCode producing the same hash value via some heuristics and confusing the HashSet, all you need to detect is to check if there is any parent child in anywhere in the object tree pointing to the same location in memory.

Categories