Why is List<T>.Enumerator faster than my implementation? - c#

I've found myself in a position where I have to roll my own dynamic array implementation, due to various large performance benefits (in my case). However, after creating an enumerator for my version, and comparing the efficiency with the one List uses, I'm a bit bewildered; the List one is aproximately 30-40% faster than my version, even though it's much more complex.
Here's the important part of the List enumerator implementation:
public struct Enumerator : IEnumerator<T>, IDisposable, IEnumerator
{
private List<T> list;
private int index;
private int version;
private T current;
internal Enumerator(List<T> list)
{
this.list = list;
this.index = 0;
this.version = list._version;
this.current = default(T);
return;
}
public bool MoveNext()
{
List<T> list;
list = this.list;
if (this.version != list._version)
{
goto Label_004A;
}
if (this.index >= list._size)
{
goto Label_004A;
}
this.current = list._items[this.index];
this.index += 1;
return 1;
Label_004A:
return this.MoveNextRare();
}
public T Current
{
get { return this.current; }
}
}
And here's my very barebone version:
internal struct DynamicArrayEnumerator<T> : IEnumerator<T> where T : class
{
private readonly T[] internalArray;
private readonly int lastIndex;
private int currentIndex;
internal DynamicArrayEnumerator(DynamicArray<T> dynamicArray)
{
internalArray = dynamicArray.internalArray;
lastIndex = internalArray.Length - 1;
currentIndex = -1;
}
public T Current
{
get { return internalArray[currentIndex]; }
}
public bool MoveNext()
{
return (++currentIndex <= lastIndex);
}
}
I know this is micro-optimization, but I'm actually interested in understanding why the List enumerator is so much faster than mine. Any ideas? Thanks!
Edit:
As requested; the DynamicArray class (the relevant parts):
The enumerator is an inner class in this.
public struct DynamicArray<T> : IEnumerable<T> where T : class
{
private T[] internalArray;
private int itemCount;
internal T[] Data
{
get { return internalArray; }
}
public int Count
{
get { return itemCount; }
}
public DynamicArray(int count)
{
this.internalArray = new T[count];
this.itemCount = 0;
}
public IEnumerator<T> GetEnumerator()
{
return new DynamicArrayEnumerator<T>(this);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
As for how I'm testing:
List<BaseClass> list = new List<BaseClass>(1000000);
DynamicArray<BaseClass> dynamicArray = new DynamicArray<BaseClass>(1000000);
// Code for filling with data omitted.
int numberOfRuns = 0;
float p1Total = 0;
float p2Total = 0;
while (numberOfRuns < 100)
{
PerformanceAnalyzer p1 = new PerformanceAnalyzer(() =>
{
int u = 0;
foreach (BaseClass b in list)
{
if (b.B > 100) // Some trivial task
u++;
}
});
p1.ExecuteAndClock();
p1Total += p1.TotalElapsedTicks;
PerformanceAnalyzer p2 = new PerformanceAnalyzer(() =>
{
int u = 0;
foreach (BaseClass b in dynamicArray)
{
if (b.B > 100) // Some trivial task
u++;
}
});
p2.ExecuteAndClock();
p2Total += p2.TotalElapsedTicks;
numberOfRuns++;
}
Console.WriteLine("List enumeration: " + p1Total / totalRuns + "\n");
Console.WriteLine("Dynamic array enumeration: " + p2Total / totalRuns + "\n");
The PerformanceAnalyzer class basically starts a Stopwatch, execute the supplied Action delegate, and then stop the Stopwatch afterwards.
Edit 2 (Quick answer to Ryan Gates):
There's a few reasons why I would want to roll my own, most importantly I need a very fast RemoveAt(int index) method.
Since I don't have to worry about the order of the list elements in my particular case, I can avoid the .Net built-in list's way of doing it:
public void RemoveAt(int index)
{
T local;
if (index < this._size)
{
goto Label_000E;
}
ThrowHelper.ThrowArgumentOutOfRangeException();
Label_000E:
this._size -= 1;
if (index >= this._size)
{
goto Label_0042;
}
Array.Copy(this._items, index + 1, this._items, index, this._size - index);
Label_0042:
this._items[this._size] = default(T);
this._version += 1;
return;
}
And instead using something along the lines of:
public void RemoveAt(int index)
{
// overwrites the element at the specified index with the last element in the array and decreases the item count.
internalArray[index] = internalArray[itemCount];
itemCount--;
}
Potencially saving enormous amounts of time in my case, if say the first 1000 elements in a long list have to be removed by index.

Okay, aside from benchmarking problems, here's how you can make your DynamicArray class more like List<T>:
public DynamicArrayEnumerator<T> GetEnumerator()
{
return new DynamicArrayEnumerator<T>(this);
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
Now, code which knows it's working with a dynamic array can iterate with a DynamicArrayEnumerator<T> without any boxing, and without virtual dispatch. This is exactly what List<T> does. The compiler notices when a type implements the pattern in a custom manner, and will use the types involved instead of the interfaces.
With your current code, you're getting no benefit from creating a struct - because you're boxing it in GetEnumerator().
Try the above change and fix the benchmark to work for longer. I'd expect to see a big difference.

Related

Why is Reset function setting wrong value?

I'm supposed to write a LIFO (last in first out) class for chars, which can be edited by Pop and Add functions and seen by Peek function or foreach. Class is working on array to be more optimalized, but foreach for some reason's not working. I tried to make function GetEnumerator based on return value of _arr.GetEnumerator() function, but it was not working, because when I printed item, in console was shown TestApp.LIFO, so I made this, but now foreach won't print a single item and by debuging _i value on Reset function is 0. Can someone say why is it happening and suggest solution?
using System;
using System.Collections;
using System.Collections.Generic;
namespace TestApp {
internal class LIFO : IEnumerator, IEnumerable {
public LIFO(int size) {
_arr = new char[size];
_index = 0;
}
public LIFO(char[] arr) {
_arr = arr.Clone() as char[];
_index = 0;
}
public char Peek() => _index == 0 ? '\0' : _arr[_index - 1];
public bool Add(char c) {
if (_index == _arr.Length)
return false;
try {
_arr[_index] = c;
} catch (Exception) {
return false;
}
++_index;
return true;
}
public void Pop() {
if (_index == 0)
return;
_arr[--_index] = '\0';
}
private int _i;
public IEnumerator GetEnumerator() => this;
public bool MoveNext() => --_i > -1;
public void Reset() => _i = _index - 1;
public object Current => _arr[_i];
private int _index;
private readonly char[] _arr;
}
}
In Program.cs:
using System;
namespace TestApp {
internal static class Program {
private static void Main() {
LIFO l = new(17);
l.Add('k');
l.Add('h');
l.Add('c');
foreach (var item in l)
Console.WriteLine(l);
}
}
}
The issue is that Reset is not called. It is no longer needed but the interface is not changed due to backwards compatibility. Since new iterators implemented using yield return is actually required to throw an exception if Reset is called, no code is expected to call this method anymore.
As such, your iterator index variable, _i, is never initialized and stays at 0. The first call to MoveNext steps it below 0 and then returns false, ending the foreach loop before it even started.
You should always decouple the iterators from your actual collection as it should be safe to enumerate the same collection twice in a nested manner, storing the index variable as an instance variable in your collection prevents this.
You can, however, simplify the enumerator implementation vastly by using yield return like this:
public class LIFO : IEnumerable
{
...
public IEnumerator GetEnumerator()
{
for (int i = _index - 1; i >= 0; i--)
yield return _arr[i];
}
}
You can then remove the _i variable, the MoveNext and Reset methods, as well as the Current property.
If you first want to make your existing code working, with the above note I made about nesting enumerators, you can change your GetEnumerator and Reset methods as follows:
public IEnumerator GetEnumerator()
{
Reset();
return this;
}
public void Reset() => _i = _index;
Note that you have to reset the _i variable to one step past the last (first) value as you're decrementing it inside MoveNext. If you don't, you'll ignore the last item added to the LIFO stack.
Your index counter variable using get values in _arr array (it is _i) and index varible that doing increase and decrease operations on it (it is _index) are different. Because of that your for loop never iterate your collection. I fix the code with some addition here. I hope it's helpful.
LIFO.cs
using System;
using System.Collections;
using System.Collections.Generic;
namespace TestApp
{
internal class LIFO : IEnumerator, IEnumerable
{
public LIFO(int size)
{
_arr = new char[size];
_index = 0;
}
public LIFO(char[] arr)
{
_arr = arr.Clone() as char[];
_index = 0;
}
public int Count() => _index;
public char Peek() => _index == 0 ? '\0' : _arr[_index - 1];
public bool Add(char c)
{
if (_index == _arr.Length)
return false;
try
{
_arr[_index] = c;
}
catch (Exception)
{
return false;
}
++_index;
_i = _index;
return true;
}
public void Pop()
{
if (_index == 0)
return;
_arr[--_index] = '\0';
_i = _index;
}
public IEnumerator GetEnumerator() => (IEnumerator)this;
public bool MoveNext() => --_index > -1;
public void Reset() => _index = _i;
public object Current
{
get => _arr[_index];
}
private int _index;
private int _i;
private readonly char[] _arr;
}
}
Program.cs
using System;
namespace TestApp
{
internal static class Program
{
private static void Main()
{
LIFO l = new LIFO(17);
l.Add('k');
l.Add('h');
l.Add('c');
Console.WriteLine("Count: " + l.Count());
foreach (var i in l)
Console.WriteLine(i);
l.Reset();
foreach (var i in l)
Console.WriteLine(i);
}
}
}

Linq: ForEach item return the number of items needed to get n unique items starting at that item

Lets say I have a list of items:
[a,b,b,a,c,d,a,d,b,c]
and I need to know, for each item, how many items along do I have to traverse till I get n unique items, (and return eg -1, or otherwise indicate if that's not possible)
So here, if n = 4, I would return
[6,5,4,6,5,5,4,-1,-1,-1]
since
a,b,b,a,c,d contains 4 unique elements
b,b,a,c,d contains 4 unique elements
b,a,c,d contains 4 unique elements,
a,c,d,a,d,b contains 4 unique elements
etc.
I used
List.Select((x,i) => {
var range = List.Skip(i).GroupBy(y => y).Take(n);
if (range.Count() == n)
return range.SelectMany(y => y).Count();
return -1;
});
Although i'm pretty sure this is horribly non-performant.
To try to minimize overhead, I created a ListSpan extension class for managing subparts of a List - something like ArraySegment for List, but (loosely) modeled on Span:
public class ListSpan<T> : IEnumerable<T>, IEnumerable {
List<T> baseList;
int start;
int len;
public ListSpan(List<T> src, int start = 0, int? len = null) {
baseList = src;
this.start = start;
this.len = len ?? (baseList.Count - start);
if (this.start + this.len > baseList.Count)
throw new ArgumentException("start+len > Count for ListSpan");
}
public T this[int n]
{
get
{
return baseList[start + n];
}
set
{
baseList[start + n] = value;
}
}
public class ListSpanEnumerator<Te> : IEnumerator<Te>, IEnumerator {
int pos;
List<Te> baseList;
int end;
Te cur = default(Te);
public ListSpanEnumerator(ListSpan<Te> src) {
pos = src.start - 1;
baseList = src.baseList;
end = src.start + src.len;
}
public Te Current => cur;
object IEnumerator.Current => Current;
public bool MoveNext() {
if (++pos < end) {
cur = baseList[pos];
return true;
}
else {
cur = default(Te);
return false;
}
}
public void Reset() => pos = 0;
public void Dispose() { }
}
public IEnumerator<T> GetEnumerator() => new ListSpanEnumerator<T>(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
public static class ListExt {
public static ListSpan<T> Slice<T>(this List<T> src, int start = 0, int? len = null) => new ListSpan<T>(src, start, len);
}
Then I created an extension method to return the distance (in Take terms) required to get n unique items from an IEnumerable:
public static class IEnumerableExt {
public static int DistanceToUnique<T>(this IEnumerable<T> src, int n, IEqualityComparer<T> cmp = null) {
var hs = new HashSet<T>(cmp ?? EqualityComparer<T>.Default);
var pos = 0;
using (var e = src.GetEnumerator()) {
while (e.MoveNext()) {
++pos;
hs.Add(e.Current);
if (hs.Count == n)
return pos;
}
}
return -1;
}
}
Now the answer is relatively straight forward:
var ans = Enumerable.Range(0, src.Count).Select(p => src.Slice(p).DistanceToUnique(n));
Basically I go through each position in the original (src) List and compute the distance to n unique values from that position using a ListSpan of the List starting at that position.
This still isn't terribly efficient in that I am creating a HashSet for every element in the original List and putting all the following elements in it, and traversing the elements up to k! times for a k element List. Still trying to come up with something really efficient.

Iterating an array with foreach backwards without using extension method

After making a search at the Google, I found this discussion:
Possible to iterate backwards through a foreach?
But in the answers there is used extension method .Reverse(). With reverse, the list of objects, for ie. List of Strings, wtill be reversed first, and foreach doesn't reverse the list with my understoodment? If I got list "Cat", "Dog", and use .Reverse() -method, the list will be "Dog", "Cat", and foreach starts from the 0 element till the lenght-1 -element and that's what I'm not looking for. I would want to know, if there was any way to reverse foreach iteration order, to start from lenght-1 down to 0.
if there was any way to reverse foreach iteration order, to start from length-1 down to 0
Not for a List<T>. The implementation of GetEnumerator() returns an enumerator that enumerates from beginning to end - there's no way to override that.
With a custom collection, then you'd just have to use a different enumerator that could go backwards, but there's no way to override the implementation that List<T> uses.
The Reverse method will copy the list first:
public static IEnumerable<TSource> Reverse<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return ReverseIterator<TSource>(source);
}
static IEnumerable<TSource> ReverseIterator<TSource>(IEnumerable<TSource> source) {
Buffer<TSource> buffer = new Buffer<TSource>(source);
for (int i = buffer.count - 1; i >= 0; i--) yield return buffer.items[i];
}
But you can do an extension method yourself:
public static IEnumerable<TSource> Backwards<TSource>(this IList<TSource> source) {
for (var i = source.Count - 1; i >= 0; --i)
yield return source[i];
}
And then use it like that:
foreach (var item in array.Backwards())
Console.WriteLine(item); // Or whatever else
Or, of course, you could just do the equivalent:
for (var i = array.Length - 1; i >= 0; --i)
Console.WriteLine(array[i]); // Or whatever else
You can implement an enumerator that iterates though a list backwards. That way you can use foreach without changing the original list or creating a copy of it.
public class ReverseEnumerator<T> : IEnumerator<T> {
private IList<T> _list;
private int _index;
private T _current;
public ReverseEnumerator(IList<T> list) {
_list = list;
Reset();
}
public IEnumerator<T> GetEnumerator() {
return this;
}
public T Current {
get {
if (_index < 0 && _index >= _list.Count) throw new InvalidOperationException("Enumeration has not started. Call MoveNext.");
return _current;
}
}
public void Dispose() { }
object IEnumerator.Current { get { return Current; } }
public bool MoveNext() {
bool ok = --_index >= 0;
if (ok) _current = _list[_index];
return ok;
}
public void Reset() {
_index = _list.Count;
}
}
Usage example:
int[] a = { 1, 2, 3, 4, 5 };
foreach (int x in new ReverseEnumerator<int>(a)) {
Console.WriteLine(x);
}

Linq-Connection for data structure

I recently designed a data structure similar to 'Queue' and 'Stack' for a special purpose, with a fixed maximum number of objects in it and when it is full and inserted, the first inserted object drops out.
The Code:
public class AssemblyLine<T>
{
private long length;
public long Length { get { return this.length; } }
private T[] data;
private long Pointer = 0;
private long count = 0;
public long Count { get { return this.count; } }
public void Insert(T obj)
{
this.Data[Pointer] = obj;
this.Next();
if (this.count < this.length)
this.count++;
}
public T[] GetLastX(long x)
{
long p = this.Pointer;
if (x > this.count)
x = this.count;
T[] result = new T[x];
for (int i = 0; i < x; i++)
{
Previous();
result[i] = Grab();
}
this.Pointer = p;
return result;
}
public T[] GetFirstX(long x)
{
long p = this.Pointer;
if (x > this.count)
x = this.count;
long gap = this.length - this.count;
this.Pointer = (this.Pointer + gap) % this.length;
T[] result = new T[x];
for (int i = 0; i < x; i++)
{
result[i] = Grab();
Next();
}
this.Pointer = p;
return result;
}
public void Clear()
{
this.data = new T[this.length];
this.count = 0;
}
private void Next()
{
this.Pointer++;
if (this.Pointer > this.length - 1)
this.Pointer = 0;
}
private void Previous()
{
this.Pointer--;
if (this.Pointer < 0)
this.Pointer = this.length - 1;
}
private T Grab()
{
return this.data[this.Pointer];
}
public AssemblyLine(long Length)
{
this.length = Length;
this.data = new T[Length];
}
}
Now I am curious if its possible to get that connected to Linq, providing something like this:
AssemblyLine<int> myAssemblyLine = new AssemblyLine(100);
// Insert some Stuff
List<int> myList = myAssemblyLine.Where(i => i > 5).ToList();
Any idea someone?
Almost all LINQ extension methods are declared on IEnumerable<T>, so as soon as your class implements that interface, you'll get it for free.
There are just couple methods that use non-generic IEnumerable, like Cast or OfType, but if you can get your class implement generic IEnumerable<T> it will be much better, because users won't have to call Cast<T> first, to get IEnumerable<T> and access all the other methods (as is right now the case for some legacy collections).
You have to implement IEnumerable<T> for .Where(). See Adding LINQ to my classes.
Once you implement IEnumerable<T> and all required methods, you will have access to these methods: http://msdn.microsoft.com/en-us/library/vstudio/system.linq.enumerable_methods(v=vs.100).aspx

Does .NET have a built in IEnumerable for multiple collections?

I need an easy way to iterate over multiple collections without actually merging them, and I couldn't find anything built into .NET that looks like it does that. It feels like this should be a somewhat common situation. I don't want to reinvent the wheel. Is there anything built in that does something like this:
public class MultiCollectionEnumerable<T> : IEnumerable<T>
{
private MultiCollectionEnumerator<T> enumerator;
public MultiCollectionEnumerable(params IEnumerable<T>[] collections)
{
enumerator = new MultiCollectionEnumerator<T>(collections);
}
public IEnumerator<T> GetEnumerator()
{
enumerator.Reset();
return enumerator;
}
IEnumerator IEnumerable.GetEnumerator()
{
enumerator.Reset();
return enumerator;
}
private class MultiCollectionEnumerator<T> : IEnumerator<T>
{
private IEnumerable<T>[] collections;
private int currentIndex;
private IEnumerator<T> currentEnumerator;
public MultiCollectionEnumerator(IEnumerable<T>[] collections)
{
this.collections = collections;
this.currentIndex = -1;
}
public T Current
{
get
{
if (currentEnumerator != null)
return currentEnumerator.Current;
else
return default(T);
}
}
public void Dispose()
{
if (currentEnumerator != null)
currentEnumerator.Dispose();
}
object IEnumerator.Current
{
get
{
return Current;
}
}
public bool MoveNext()
{
if (currentIndex >= collections.Length)
return false;
if (currentIndex < 0)
{
currentIndex = 0;
if (collections.Length > 0)
currentEnumerator = collections[0].GetEnumerator();
else
return false;
}
while (!currentEnumerator.MoveNext())
{
currentEnumerator.Dispose();
currentEnumerator = null;
currentIndex++;
if (currentIndex >= collections.Length)
return false;
currentEnumerator = collections[currentIndex].GetEnumerator();
}
return true;
}
public void Reset()
{
if (currentEnumerator != null)
{
currentEnumerator.Dispose();
currentEnumerator = null;
}
this.currentIndex = -1;
}
}
}
Try the SelectMany extension method added in 3.5.
IEnumerable<IEnumerable<int>> e = ...;
foreach ( int cur in e.SelectMany(x => x)) {
Console.WriteLine(cur);
}
The code SelectMany(x => x) has the effect of flattening a collection of collections into a single collection. This is done in a lazy fashion and allows for straight forward processing as shown above.
If you only have C# 2.0 available, you can use an iterator to achieve the same results.
public static IEnumerable<T> Flatten<T>(IEnumerable<IEnumerable<T>> enumerable) {
foreach ( var inner in enumerable ) {
foreach ( var value in inner ) {
yield return value;
}
}
}
Just use the Enumerable.Concat() extension method to "concatenate" two IEnumerables. Don't worry, it doesn't actually copy them into a single array (as you might infer from the name), it simply allows you to enumerate over them all as if they were one IEnumerable.
If you have more than two then Enumerable.SelectMany() would be better.

Categories