C#: Avoid infinite recursion when traversing object graph - c#

I have an object graph wherein each child object contains a property that refers back to its parent. Are there any good strategies for ignoring the parent references in order to avoid infinite recursion? I have considered adding a special [Parent] attribute to these properties or using a special naming convention, but perhaps there is a better way.

If the loops can be generalised (you can have any number of elements making up the loop), you can keep track of objects you've seen already in a HashSet and stop if the object is already in the set when you visit it. Or add a flag to the objects which you set when you visit it (but you then have to go back & unset all the flags when you're done, and the graph can only be traversed by a single thread at a time).
Alternatively, if the loops will only be back to the parent, you can keep a reference to the parent and not loop on properties that refer back to it.
For simplicity, if you know the parent reference will have a certain name, you could just not loop on that property :)

What a coincidence; this is the topic of my blog this coming Monday. See it for more details. Until then, here's some code to give you an idea of how to do this:
static IEnumerable<T> Traversal<T>(
T item,
Func<T, IEnumerable<T>> children)
{
var seen = new HashSet<T>();
var stack = new Stack<T>();
seen.Add(item);
stack.Push(item);
yield return item;
while(stack.Count > 0)
{
T current = stack.Pop();
foreach(T newItem in children(current))
{
if (!seen.Contains(newItem))
{
seen.Add(newItem);
stack.Push(newItem);
yield return newItem;
}
}
}
}
The method takes two things: an item, and a relation that produces the set of everything that is adjacent to the item. It produces a depth-first traversal of the transitive and reflexive closure of the adjacency relation on the item. Let the number of items in the graph be n, and the maximum depth be 1 <= d <= n, assuming the branching factor is not bounded. This algorithm uses an explicit stack rather than recursion because (1) recursion in this case turns what should be an O(n) algorithm into O(nd), which is then something between O(n) and O(n^2), and (2) excessive recursion can blow the stack if the d is more than a few hundred nodes.
Note that the peak memory usage of this algorithm is of course O(n + d) = O(n).
So, for example:
foreach(Node node in Traversal(myGraph.Root, n => n.Children))
Console.WriteLine(node.Name);
Make sense?

If you're doing a graph traversal, you can have a "visited" flag on each node. This ensures that you don't revisit a node and possibly get stuck in an infinite loop. I believe this is the standard way of performing a graph traversal.

This is a common problem, but the best approach depends on the scenario. An additional problem is that in many cases it isn't a problem visiting the same object twice - that doesn't imply recursion - for example, consider the tree:
A
=> B
=> C
=> D
=> C
This may be valid (think XmlSerializer, which would simply write the C instance out twice), so it is often necessary to push/pop objects on a stack to check for true recursion. The last time I implemented a "visitor", I kept a "depth" counter, and only enabled the stack checking beyond a certain threshold - that means that most trees simply end up doing some ++/--, but nothing more expensive. You can see the approach I took here.

I'm not exactly sure what you are trying to do here but you could just maintain a hashtable with all previously visited nodes when you are doing your breadth first search of depth first search.

I published a post explaining in detail with code examples how to do object traversal by recursive reflection and also detect and avoid recursive references to prevent a stack over flow exception: https://doguarslan.wordpress.com/2016/10/03/object-graph-traversal-by-recursive-reflection/
In that example I did a depth first traversal using recursive reflection and I maintained a HashSet of visited nodes for reference types. One thing to be careful is to initialize your HashSet with your custom equality comparer which uses the object reference for hash calculation, basically the GetHashCode() method implemented by the base object class itself and not any overloaded versions of GetHashCode() because if the types of properties you traverse overload GetHashCode method, you may detect false hash collisions and think that you detected a recursive reference which in reality could be that the overloaded version of GetHashCode producing the same hash value via some heuristics and confusing the HashSet, all you need to detect is to check if there is any parent child in anywhere in the object tree pointing to the same location in memory.

Related

Why do we need two interfaces to enumerate a collection?

It is quite a while that I have been trying to understand the idea behind IEnumerable and IEnumerator. I read all the questions and answers I could find over the net, and on StackOverflow in particular, but I am not satisfied. I got to the point where I understand how those interfaces should be used, but not why they are used this way.
I think that the essence of my misunderstanding is that we need two interfaces for one operation. I realized that if both are needed, one was probably not enough. So I took the "hard coded" equivalent of foreach (as I found here):
while (enumerator.MoveNext())
{
object item = enumerator.Current;
// logic
}
and tried to get it to work with one interface, thinking something would go wrong which would make me understand why another interface is needed.
So I created a collection class, and implemented IForeachable:
class Collection : IForeachable
{
private int[] array = { 1, 2, 3, 4, 5 };
private int index = -1;
public int Current => array[index];
public bool MoveNext()
{
if (index < array.Length - 1)
{
index++;
return true;
}
index = -1;
return false;
}
}
and used the foreach equivalent to nominate the collection:
var collection = new Collection();
while (collection.MoveNext())
{
object item = collection.Current;
Console.WriteLine(item);
}
And it works! So what is missing here that make another interface required?
Thanks.
Edit:
My question is not a duplicate of the questions listed in the comments:
This question is why interfaces are needed for enumerating in the first place.
This question and this question are about what are those interfaces and how should they be used.
My question is why they are designed the way they are, not what are they, how they work, and why do we need them in the first place.
What are the two interfaces and what do they do?
The IEnumerable interface is placed on the collection object and defines the GetEnumerator() method, this returns a (normally new) object that has implements the IEnumerator interface. The foreach statement in C# and For Each statement in VB.NET use IEnumerable to access the enumerator in order to loop over the elements in the collection.
The IEnumerator interface is esentially the contract placed on the object that actually does the iteration. It stores the state of the iteration and updates it as the code moves through the collection.
Why not just have the collection be the enumerator too? Why have two separate interfaces?
There is nothing to stop IEnumerator and IEnumerable being implemented on the same class. However, there is a penalty for doing this – It won’t be possible to have two, or more, loops on the same collection at the same time. If it can be absolutely guaranteed that there won’t ever be a need to loop on the collection twice at the same time then that’s fine. But in the majority of circumstances that isn’t possible.
When would someone iterate over a collection more than once at a time?
Here are two examples.
The first example is when there are two loops nested inside each other on the same collection. If the collection was also the enumerator then it wouldn’t be possible to support nested loops on the same collection, when the code gets to the inner loop it is going to collide with the outer loop.
The second example is when there are two, or more, threads accessing the same collection. Again, if the collection was also the enumerator then it wouldn’t be possible to support safe multithreaded iteration over the same collection. When the second thread attempts to loop over the elements in the collection the state of the two enumerations will collide.
Also, because the iteration model used in .NET does not permit alterations to a collection during enumeration these operations are otherwise completely safe.
-- This was from a blog post I wrote many years ago: https://colinmackay.scot/2007/06/24/iteration-in-net-with-ienumerable-and-ienumerator/
Your IForeachable cannot even be iterated from two different threads (you cannot have multiple active iterations at all - even from the same thread), because current enumeration state stored in IForeachable itself. You also have to reset your current position each time you finished enumeration, and if you forgot to do that - well, next caller will think your collection is empty. I can only imagine all kind of hard to track bugs this all might lead to.
On the other hand, because IEnumerable returns new IEnumerator for each caller - you can have multiple enumerations in progress simultaneously, because each caller has it's own enumeration state. I think this reason alone is enough to justify two interfaces. Enumeration is essentially read operation, and it would have been very confusing if you cannot read the same thing simultaneously in multiple places.

Why Single(IEnumerable<T>,Predicate<T>) is so inefficient [duplicate]

I came across this implementation in Enumerable.cs by reflector.
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//check parameters
TSource local = default(TSource);
long num = 0L;
foreach (TSource local2 in source)
{
if (predicate(local2))
{
local = local2;
num += 1L;
//I think they should do something here like:
//if (num >= 2L) throw Error.MoreThanOneMatch();
//no necessary to continue
}
}
//return different results by num's value
}
I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:
class DataItem
{
private int _num;
public DataItem(int num)
{
_num = num;
}
public int Num
{
get{ Console.WriteLine("getting "+_num); return _num;}
}
}
var source = Enumerable.Range(1,10).Select( x => new DataItem(x));
var result = source.Single(x => x.Num < 5);
For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception.
Is there any algorithmic reason they implement this method like this?
EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.
Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.
In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.
Code that uses Single where zero or multiple matches is a perfectly valid possibility, such as:
try
{
var item = source.Single(predicate);
DoSomething(item);
}
catch(InvalidOperationException)
{
DoSomethingElseUnexceptional();
}
should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):
var firstTwo = source.Where(predicate).Take(2).ToArray();
if(firstTwo.Length == 1)
{
// Note that this won't fail. If it does, this code has a bug.
DoSomething(firstTwo.Single());
}
else
{
DoSomethingElseUnexceptional();
}
In other words, we should leave the use of Single to cases when we expect the sequence to contain only one match. It should behave identically to Firstbut with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).
This leaves us with two cases:
The assertion holds: There is a single match. In this case, we want Single to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.
The assertion fails: There are zero or multiple matches. In this case, we do throw later than we could, but this isn't such a big deal since the exception is boneheaded: it is indicative of a bug that must be fixed.
To sum up, if the 'poor implementation' is biting you performance-wise in production, either:
You are using Single incorrectly.
You have a bug in your program. Once the bug is fixed, this particular performance problem will go away.
EDIT: Clarified my point.
EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):
public static User GetUserById(this IEnumerable<User> users, string id)
{
if(users == null)
throw new ArgumentNullException("users");
// Perfectly fine if documented that a failure in the query
// is treated as an exceptional circumstance. Caller's job
// to guarantee pre-condition.
return users.Single(user => user.Id == id);
}
Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.
New answer:
Let's assume that Single should throw an exception when it's pre-condition is not met; that is, when Single detects than either none, or more than one item in the collection matches the predicate.
Single can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.
This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single's pre-condition cannot be met and when it will throw an exception.
As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.
Thus, it is actually correct that Single could throw an exception early; but it doesn't have to, because there's practically no added benefit.
Old answer:
I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:
My understanding of Single:
What is the purpose of Single, and how is it different from e.g. First or Last?
Using the Single operator basically expresses one's assumption that exactly one item must be returned from the collection:
If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.
If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single().)
This is what makes Single different from other operators such as First, Last, or Take(1). None of those operators have the requirement that there should be exactly one (matching) item.
When should Single throw an exception?
Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.
When should Single be used?
The use of Single is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.
If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single. Single, together with an exception catch block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single, you should already have made sure that there'll be only one matching item.
Conclusion:
The above states my understanding of the Single LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single is violated as soon as it detects a second (matching) item in the collection.
When considering this implementation we must remember that this is the BCL: general code that is supposed to work good enough in all sorts of scenarios.
First, take these scenarios:
Iterate over 10 numbers, where the first and second elements are equal
Iterate over 1.000.000 numbers, where the first and third elements are equal
The original algorithm will work well enough for 10 items, but 1M will have a severe waste of cycles. So in these cases where we know that there are two or more early in the sequences, the proposed optimization would have a nice effect.
Then, look at these scenarios:
Iterate over 10 numbers, where the first and last elements are equal
Iterate over 1.000.000 numbers, where the first and last elements are equal
In these scenarios the algorithm is still required to inspect every item in the lists. There is no shortcut. The original algorithm will perform good enough, it fulfills the contract. Changing the algorithm, introducing an if on each iteration will actually decrease performance. For 10 items it will be negligible, but 1M it will be a big hit.
IMO, the original implementation is the correct one, since it is good enough for most scenarios. Knowing the implementation of Single is good though, because it enables us to make smart decisions based on what we know about the sequences we use it on. If performance measurements in one particular scenario shows that Single is causing a bottleneck, well: then we can implement our own variant that works better in that particular scenario.
Update: as CodeInChaos and Eamon have correctly pointed out, the if test introduced in the optimization is indeed not performed on each item, only within the predicate match block. I have in my example completely overlooked the fact that the proposed changes will not affect the overall performance of the implementation.
I agree that introducing the optimization would probably benefit all scenarios. It is good to see though that eventually, the decision to implement the optimization is made on the basis of performance measurements.
I think it's a premature optimization "bug".
Why this is NOT reasonable behavior due to side effects
Some have argued that due to side effects, it should be expected that the entire list is evaluated. After all, in the correct case (the sequence indeed has just 1 element) it is completely enumerated, and for consistency with this normal case it's nicer to enumerate the entire sequence in all cases.
Although that's a reasonable argument, it flies in the face of the general practice throughout the LINQ libraries: they use lazy evaluation everywhere. It's not general practice to fully enumerate sequences except where absolutely necessary; indeed, several methods prefer using IList.Count when available over any iteration at all - even when that iteration may have side effects.
Further, .Single() without predicate does not exhibit this behavior: that terminates as soon as possible. If the argument were that .Single() should respect side-effects of enumeration, you'd expect all overloads to do so equivalently.
Why the case for speed doesn't hold
Peter Lillevold made the interesting observation that it may be faster to do...
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
}
if(count!=1)...
than
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
if(count>1) ...
}
if(count==0)...
After all, the second version, which would exit the iteration as soon as the first conflict is detected, would require an extra test in the loop - a test which in the "correct" is purely ballast. Neat theory, right?
Except, that's not bourne out by the numbers; for example on my machine (YMMV) Enumerable.Range(0,100000000).Where(x=>x==123).Single() is actually faster than Enumerable.Range(0,100000000).Single(x=>x==123)!
It's possibly a JITter quirk of this precise expression on this machine - I'm not claiming that Where followed by predicateless Single is always faster.
But whatever the case, the fail-fast solution is very unlikely to be significantly slower. After all, even in the normal case, we're dealing with a cheap branch: a branch that is never taken and thus easy on the branch predictor. And of course; the branch is further only ever encountered when pred holds - that's once per call in the normal case. That cost is simply negligible compared to the cost of the delegate call pred and its implementation, plus the cost of the interface methods .MoveNext() and .get_Current() and their implementations.
It's simply extremely unlikely that you'll notice the performance degradation caused by one predictable branch in comparison to all that other abstraction penalty - not to mention the fact that most sequences and predicates actually do something themselves.
It seems very clear to me.
Single is intended for the case where the caller knows that the enumeration contains exactly one match, since in any other case an expensive exception is thrown.
For this use case, the overload that takes a predicate must iterate over the whole enumeration. It is slightly faster to do so without an additional condition on every loop.
In my view the current implementation is correct: it is optimized for the expected use case of an enumeration that contains exactly one matching element.
That does appear to be a bad implementation, in my opinion.
Just to illustrate the potential severity of the problem:
var oneMillion = Enumerable.Range(1, 1000000)
.Select(x => { Console.WriteLine(x); return x; });
int firstEven = oneMillion.Single(x => x % 2 == 0);
The above will output all the integers from 1 to 1000000 before throwing an exception.
It's a head-scratcher for sure.
I only found this question after filing a report at https://connect.microsoft.com/VisualStudio/feedback/details/810457/public-static-tsource-single-tsource-this-ienumerable-tsource-source-func-tsource-bool-predicate-doesnt-throw-immediately-on-second-matching-result#
The side-effect argument doesn't hold water, because:
Having side-effects isn't really functional, and they're called Func for a reason.
If you do want side-effects, it makes no more sense to claim the version that has the side-effects throughout the whole sequence is desirable than it does to claim so for the version that throws immediately.
It does not match the behaviour of First or the other overload of Single.
It does not match at least some other implementations of Single, e.g. Linq2SQL uses TOP 2 to ensure that only the two matching cases needed to test for more than one match are returned.
We can construct cases where we should expect a program to halt, but it does not halt.
We can construct cases where OverflowException is thrown, which is not documented behaviour, and hence clearly a bug.
Most importantly of all, if we're in a condition where we expected the sequence to have only one matching element, and yet we're not, then something has clearly gone wrong. Apart from the general principle that the only thing you should do upon detecting an error state is clean-up (and this implementation delays that) before throwing, the case of an sequence having more than one matching element is going to overlap with the case of a sequence having more elements in total than expected - perhaps because the sequence has a bug that is causing it to loop unexpectedly. So it's precisely in one possible set of bugs that should trigger the exception, that the exception is most delayed.
Edit:
Peter Lillevold's mention of a repeated test may be a reason why the author chose to take the approach they did, as an optimisation for the non-exceptional case. If so it was needless though, even aside from Eamon Nerbonne showing it wouldn't improve much. There's no need to have a repeated test in the initial loop, as we can just change what we're testing for upon the first match:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if(source == null)
throw new ArgumentNullException("source");
if(predicate == null)
throw new ArgumentNullException("predicate");
using(IEnumerator<TSource> en = source.GetEnumerator())
{
while(en.MoveNext())
{
TSource cur = en.Current;
if(predicate(cur))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("Sequence contains more than one matching element");
return cur;
}
}
}
throw new InvalidOperationException("Sequence contains no matching element");
}

How to remove from List<T> efficiently (C#)?

If I understood correctly (and please correct me if i'm wrong), list is implemented by array in .NET, which means that every deletion of an item in the list will cause re-allocation of all the list (which in turn means O(n)).
I'm developing a game, in the game i have many bullets fly in the air on any giving moment, let's say 100 bullets, each frame I move them by few pixels and check for collision with objects in the game, I need to remove from the list every bullet that collided.
So I collect the collided bullet in another temporary list and then do the following:
foreach (Bullet bullet in bulletsForDeletion)
mBullets.Remove(bullet);
Because the loop is O(n) and the remove is O(n), I spend O(n^2) time to remove.
Is there a better way to remove it, or more suitable collection to use?
Sets and linked lists both have constant time removal. Can you use either of those data structures?
There's no way to avoid the O(N) cost of removal from List<T>. You'll need to use a different data structure if this is a problem for you. It may make the code which calculates bulletsToRemove feel nicer too.
ISet<T> has nice methods for calculating differences and intersections between sets of objects.
You lose ordering by using sets, but given you are taking bullets, I'm guessing that is not an issue. You can still enumerate it in constant time.
In your case, you might write:
mBullets.ExceptWith(bulletsForDeletion);
Create a new list:
var newList = `oldList.Except(deleteItems).ToList()`.
Try to use functional idioms wherever possible. Don't modify existing data structures, create new ones.
This algorithm is O(N) thanks to hashing.
Can't you just switch from List (equivalent of Java's ArrayList to highlight that) to LinkedList? LinkedList takes O(1) to delete a specific element, but O(n) to delete by index
How a list is implemented internally is not something you should be thinking about. You should be interacting with the list in that abstraction level.
If you do have performance problems and you pinpoint them to the list, then it is time to look at how it is implemented.
As for removing items from a list - you can use mBullets.RemoveAll(predicate), where predicate is an expression that identifies items that have collided.
RemoveAt(int index) is faster than Remove(T item) because the later use the first inside it, doing reflection, this is the code inside each function.
Also, Remove(T) has the function IndexOf, which has inside it a second loop to evalute the index of each item.
public bool Remove(T item)
{
int index = this.IndexOf(item);
if (index < 0)
return false;
this.RemoveAt(index);
return true;
}
public void RemoveAt(int index)
{
if ((uint) index >= (uint) this._size)
ThrowHelper.ThrowArgumentOutOfRangeException();
--this._size;
if (index < this._size)
Array.Copy((Array) this._items, index + 1, (Array) this._items, index, this._size - index);
this._items[this._size] = default (T);
++this._version;
}
I would do a loop like this:
for (int i=MyList.Count-1; i>=0; i--)
{
// add some code here
if (needtodelete == true)
MyList.RemoveAt(i);
}
In the spirit of functional idiom suggested by "usr", you might consider not removing from your list at all. Instead, have your update routine take a list and return a list. The list returned contains only the "still-alive" objects. You can then swap lists at the end of the game loop (or immediately, if appropriate). I've done this myself.
Ideally, it shouldn't be important to think about how C# implements its List methods, and somewhat unfortunate that it reassigns all the array elements after index i when doing somelist.RemoveAt(i). Sometimes optimizations around standard library implementations are necessary, though.
If you find your execution is expending unacceptable effort shuffling list elements around when removing them from a list with RemoveAt, one approach that might work for you is to swap it with the end, and then RemoveAt off the end of the list:
if (i < mylist.Count-1) {
mylist[i] = mylist[mylist.Count-1];
}
mylist.RemoveAt(mylist.Count-1);
This is of course an O(1) operation.

Best .NET Array/List

So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?
Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)
You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}
I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.
How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.
Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()

Is it OK to reuse IEnumerable collections more than once?

Basically I am wondering if it's ok to use an enumeration more than once in code subsequently. Whether you break early or not, would the enumeration always reset in every foreach case below, so giving us consistent enumeration from start to end of the effects collection?
var effects = this.EffectsRecursive;
foreach (Effect effect in effects)
{
...
}
foreach (Effect effect in effects)
{
if(effect.Name = "Pixelate")
break;
}
foreach (Effect effect in effects)
{
...
}
EDIT: The implementation of EffectsRecursive is this:
public IEnumerable<Effect> Effects
{
get
{
for ( int i = 0; i < this.IEffect.NumberOfChildren; ++i )
{
IEffect effect = this.IEffect.GetChildEffect ( i );
if ( effect != null )
yield return new Effect ( effect );
}
}
}
public IEnumerable<Effect> EffectsRecursive
{
get
{
foreach ( Effect effect in this.Effects )
{
yield return effect;
foreach ( Effect seffect in effect.ChildrenRecursive )
yield return seffect;
}
}
}
Yes this is legal to do. The IEnumerable<T> pattern is meant to support multiple enumerations on a single source. Collections which can only be enumerated once should expose IEnumerator instead.
The code that consumes the sequence is fine. As spender points out, the code that produces the enumeration might have performance problems if the tree is deep.
Suppose at the deepest point your tree is four deep; think about what happens on the nodes that are four deep. To get that node, you iterate the root, which calls an iterator, which calls an iterator, which calls an iterator, which passes the node back to code that passes the node back to code that passes the node back... Instead of just handing the node to the caller, you've made a little bucket brigade with four guys in it, and they're tramping the data around from object to object before it finally gets to the loop that wanted it.
If the tree is only four deep, no big deal probably. But suppose the tree is ten thousand elements, and has a thousand nodes forming a linked list at the top and the remaining nine thousand nodes on the bottom. Now when you iterate those nine thousand nodes each one has to pass through a thousand iterators, for a total of nine million copies to fetch nine thousand nodes. (Of course, you've probably gotten a stack overflow error and crashed the process as well.)
The way to deal with this problem if you have it is to manage the stack yourself rather than pushing new iterators on the stack.
public IEnumerable<Effect> EffectsNotRecursive()
{
var stack = new Stack<Effect>();
stack.Push(this);
while(stack.Count != 0)
{
var current = stack.Pop();
yield return current;
foreach(var child in current.Effects)
stack.Push(child);
}
}
The original implementation has a time complexity of O(nd) where n is the number of nodes and d is the average depth of the tree; since d can in the worst case be O(n), and in the best case be O(lg n), that means that the algorithm is between O(n lg n) and O(n^2) in time. It is O(d) in heap space (for all the iterators) and O(d) in stack space (for all the recursive calls.)
The new implementation has a time complexity of O(n), and is O(d) in heap space, and O(1) in stack space.
One down side of this is that the order is different; the tree is traversed from top to bottom and right to left in the new algorithm, instead of top to bottom and left to right. If that bothers you then you can just say
foreach(var child in current.Effects.Reverse())
instead.
For more analysis of this problem, see my colleague Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
Legal, yes. Whether it will function as you expect depends on:
The implementation of the IEnumerable returned by EffectsRecursive and whether it always returns the same set;
Whether you want to enumerate the same set both times
If it returns an IEnumerable that requires some intensive work, and it doesn't cache the results internally, then you may need to .ToList() it yourself. If it does cache the results, then ToList() would be slightly redundant but probably no harm.
Also, if GetEnumerator() is implemented in a typical/proper (*) way, then you can safely enumerate any number of times - each foreach will be a new call to GetEnumerator() which returns a new instance of the IEnumerator. But it could be that in some situations it returns the same IEnumerator instance that's already been partially or fully enumerated, so it all really just depends on the specific intended usage of that particular IEnumerable.
*I'm pretty sure returning the same enumerator multiple times is actually a violation of the implied contract for the pattern, but I have seen some implementations that do it anyway.
Most probably, yes. Most implementations of IEnumerable return a fresh IEnumerator which starts at the beginning of the list.
It all depends on the implementation of the type of EffectsRecursive.

Categories