Implementing an immutable enumerator - c#

Consider the following possible interface for an immutable generic enumerator:
interface IImmutableEnumerator<T>
{
(bool Succesful, IImmutableEnumerator<T> NewEnumerator) MoveNext();
T Current { get; }
}
How would you implement this in a reasonably performant way in c#? I'm a little out of ideas, because the IEnumerator infrastructure in .NET is inherently mutable and I can't see a way around it.
A naive implementation would be to simply create a new enumerator on every MoveNext() handing down a new inner mutable enumerator with current.Skip(1).GetEnumerator() but that is horribly inefficient.
I'm implementing a parser that needs to be able to look ahead; using an immutable enumerator would make things cleaner and easier to follow so I'm curious if there is an easy way to do this that I might be missing.
The input is an IEnumerable<T> and I can't change that. I can always materialize the enumerable with ToList() of course (with an IList in hand, looking ahead is trivial), but the data can be pretty large and I'd like to avoid it, if possible.

This is it:
public class ImmutableEnumerator<T> : IImmutableEnumerator<T>, IDisposable
{
public static (bool Succesful, IImmutableEnumerator<T> NewEnumerator) Create(IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
var successful = enumerator.MoveNext();
return (successful, new ImmutableEnumerator<T>(successful, enumerator));
}
private IEnumerator<T> _enumerator;
private (bool Succesful, IImmutableEnumerator<T> NewEnumerator) _runOnce = (false, null);
private ImmutableEnumerator(bool successful, IEnumerator<T> enumerator)
{
_enumerator = enumerator;
this.Current = successful ? _enumerator.Current : default(T);
if (!successful)
{
_enumerator.Dispose();
}
}
public (bool Succesful, IImmutableEnumerator<T> NewEnumerator) MoveNext()
{
if (_runOnce.NewEnumerator == null)
{
var successful = _enumerator.MoveNext();
_runOnce = (successful, new ImmutableEnumerator<T>(successful, _enumerator));
}
return _runOnce;
}
public T Current { get; private set; }
public void Dispose()
{
_enumerator.Dispose();
}
}
My test code succeeds nicely:
var xs = new[] { 1, 2, 3 };
var ie = ImmutableEnumerator<int>.Create(xs);
if (ie.Succesful)
{
Console.WriteLine(ie.NewEnumerator.Current);
var ie1 = ie.NewEnumerator.MoveNext();
if (ie1.Succesful)
{
Console.WriteLine(ie1.NewEnumerator.Current);
var ie2 = ie1.NewEnumerator.MoveNext();
if (ie2.Succesful)
{
Console.WriteLine(ie2.NewEnumerator.Current);
var ie3 = ie2.NewEnumerator.MoveNext();
if (ie3.Succesful)
{
Console.WriteLine(ie3.NewEnumerator.Current);
var ie4 = ie3.NewEnumerator.MoveNext();
}
}
}
}
This outputs:
1
2
3
It's immutable and it's efficient.
Here's a version using Lazy<(bool, IImmutableEnumerator<T>)> as per a request in the comments:
public class ImmutableEnumerator<T> : IImmutableEnumerator<T>, IDisposable
{
public static (bool Succesful, IImmutableEnumerator<T> NewEnumerator) Create(IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
var successful = enumerator.MoveNext();
return (successful, new ImmutableEnumerator<T>(successful, enumerator));
}
private IEnumerator<T> _enumerator;
private Lazy<(bool, IImmutableEnumerator<T>)> _runOnce;
private ImmutableEnumerator(bool successful, IEnumerator<T> enumerator)
{
_enumerator = enumerator;
this.Current = successful ? _enumerator.Current : default(T);
if (!successful)
{
_enumerator.Dispose();
}
_runOnce = new Lazy<(bool, IImmutableEnumerator<T>)>(() =>
{
var s = _enumerator.MoveNext();
return (s, new ImmutableEnumerator<T>(s, _enumerator));
});
}
public (bool Succesful, IImmutableEnumerator<T> NewEnumerator) MoveNext()
{
return _runOnce.Value;
}
public T Current { get; private set; }
public void Dispose()
{
_enumerator.Dispose();
}
}

You can achieve pseudo immutability suitable in this particular scenario by utilising a singly linked list. It allows for infinite look-ahead (limited only by your heap size) without the ability to look at previously processed nodes (unless you happen to store a reference to a previously processed node - which you shouldn't).
This solution addresses the requirements as stated (except for not conforming to your exact interface, with all of its functionality nevertheless intact).
The usage of such a linked list might look like this:
IEnumerable<int> numbersFromZeroToNine = Enumerable.Range(0, 10);
using (IEnumerator<int> enumerator = numbersFromZeroToNine.GetEnumerator())
{
var node = LazySinglyLinkedListNode<int>.CreateListHead(enumerator);
while (node != null)
{
Console.WriteLine($"Current value: {node.Value}.");
if (node.Next != null)
{
// Single-element look-ahead. Technically you could do node.Next.Next...Next.
// You can also nest another while loop here, and look ahead as much as needed.
Console.WriteLine($"Next value: {node.Next.Value}.");
}
else
{
Console.WriteLine("End of collection reached. There is no next value.");
}
node = node.Next;
// At this point the object which used to be referenced by the "node" local
// becomes eligible for collection, preventing unbounded memory growth.
}
}
Output:
Current value: 0.
Next value: 1.
Current value: 1.
Next value: 2.
Current value: 2.
Next value: 3.
Current value: 3.
Next value: 4.
Current value: 4.
Next value: 5.
Current value: 5.
Next value: 6.
Current value: 6.
Next value: 7.
Current value: 7.
Next value: 8.
Current value: 8.
Next value: 9.
Current value: 9.
End of collection reached. There is no next value.
The implementation is as follows:
sealed class LazySinglyLinkedListNode<T>
{
public static LazySinglyLinkedListNode<T> CreateListHead(IEnumerator<T> enumerator)
{
return enumerator.MoveNext() ? new LazySinglyLinkedListNode<T>(enumerator) : null;
}
public T Value { get; }
private IEnumerator<T> Enumerator;
private LazySinglyLinkedListNode<T> _next;
public LazySinglyLinkedListNode<T> Next
{
get
{
if (_next == null && Enumerator != null)
{
if (Enumerator.MoveNext())
{
_next = new LazySinglyLinkedListNode<T>(Enumerator);
}
else
{
Enumerator = null; // We've reached the end.
}
}
return _next;
}
}
private LazySinglyLinkedListNode(IEnumerator<T> enumerator)
{
Value = enumerator.Current;
Enumerator = enumerator;
}
}
An important thing to note here is that the source collection is only enumerated once, lazily, with MoveNext being called at most once per each node's lifetime regardless of how many times you access Next.
Using a doubly-linked list would allow look-behind, but would cause infinite memory growth and require periodic pruning, which is not trivial. Singly-linked list avoids this issue as long as you are not storing node references outside of your main loop. In the example above you could replace numbersFromZeroToNine with an IEnumerable<int> generator which infinitely yields integers, and the loop will run forever without running out of memory.

Related

C# How to partition parallel foreach loop to iterate my list

I am new in programming world. i am doing my graduation and also learning dotnet.
I want to iterate my list in parallel foreach but i want to use partition there. I have lack of knowledge so my code is not compiling.
Actually this way i did it first which is working.
Parallel.ForEach(MyBroker, broker =>,,
{
mybrow = new WeightageRowNumber();
mybrow.RowNumber = Interlocked.Increment(ref rowNumber);
lock (_lock)
{
Mylist.Add(mybrow);
}
});
now i want to use partition so i change my code this way but now my code not compiling. here is code
Parallel.ForEach(MyBroker, broker,
(j, loop, subtotal) =>
{
mybrow = new WeightageRowNumber();
mybrow.RowNumber = Interlocked.Increment(ref rowNumber);
lock (_lock)
{
Mylist.Add(mybrow);
}
return brokerRowWeightageRowNumber.RowNumber;
},
(finalResult) =>
var rownum= Interlocked.Increment(ref finalResult);
console.writeline(rownum);
);
please see my second set of code and show me how to restructure to use partition for parallel foreach to iterate my list.
please guide me. thanks
The Parallel.ForEach method has 20 overloads - perhaps try a different overload?
Without your dependencies included I can't give a 1-to-1 example on your implementation but here is an in-depth example (reformatted from here) that you can copy into your IDE and set debug breakpoints (if that's useful). Unfortunately building an instantiable overload of OrderablePartitioner appears non-trivial so sorry for all the boilerplate code:
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Concurrent;
using System.Collections;
using System.Linq;
// Simple partitioner that will extract one (index,item) pair at a time,
// in a thread-safe fashion, from the underlying collection.
class SingleElementOrderablePartitioner<T> : OrderablePartitioner<T>
{
// The collection being wrapped by this Partitioner
IEnumerable<T> m_referenceEnumerable;
// Class used to wrap m_index for the purpose of sharing access to it
// between an InternalEnumerable and multiple InternalEnumerators
private class Shared<U>
{
internal U Value;
public Shared(U item)
{
Value = item;
}
}
// Internal class that serves as a shared enumerable for the
// underlying collection.
private class InternalEnumerable : IEnumerable<KeyValuePair<long, T>>, IDisposable
{
IEnumerator<T> m_reader;
bool m_disposed = false;
Shared<long> m_index = null;
// These two are used to implement Dispose() when static partitioning is being performed
int m_activeEnumerators;
bool m_downcountEnumerators;
// "downcountEnumerators" will be true for static partitioning, false for
// dynamic partitioning.
public InternalEnumerable(IEnumerator<T> reader, bool downcountEnumerators)
{
m_reader = reader;
m_index = new Shared<long>(0);
m_activeEnumerators = 0;
m_downcountEnumerators = downcountEnumerators;
}
public IEnumerator<KeyValuePair<long, T>> GetEnumerator()
{
if (m_disposed)
throw new ObjectDisposedException("InternalEnumerable: Can't call GetEnumerator() after disposing");
// For static partitioning, keep track of the number of active enumerators.
if (m_downcountEnumerators) Interlocked.Increment(ref m_activeEnumerators);
return new InternalEnumerator(m_reader, this, m_index);
}
IEnumerator<KeyValuePair<long, T>> IEnumerable<KeyValuePair<long, T>>.GetEnumerator()
{
return this.GetEnumerator();
}
public void Dispose()
{
if (!m_disposed)
{
// Only dispose the source enumerator if you are doing dynamic partitioning
if (!m_downcountEnumerators)
{
m_reader.Dispose();
}
m_disposed = true;
}
}
// Called from Dispose() method of spawned InternalEnumerator. During
// static partitioning, the source enumerator will be automatically
// disposed once all requested InternalEnumerators have been disposed.
public void DisposeEnumerator()
{
if (m_downcountEnumerators)
{
if (Interlocked.Decrement(ref m_activeEnumerators) == 0)
{
m_reader.Dispose();
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
}
// Internal class that serves as a shared enumerator for
// the underlying collection.
private class InternalEnumerator : IEnumerator<KeyValuePair<long, T>>
{
KeyValuePair<long, T> m_current;
IEnumerator<T> m_source;
InternalEnumerable m_controllingEnumerable;
Shared<long> m_index = null;
bool m_disposed = false;
public InternalEnumerator(IEnumerator<T> source, InternalEnumerable controllingEnumerable, Shared<long> index)
{
m_source = source;
m_current = default(KeyValuePair<long, T>);
m_controllingEnumerable = controllingEnumerable;
m_index = index;
}
object IEnumerator.Current
{
get { return m_current; }
}
KeyValuePair<long, T> IEnumerator<KeyValuePair<long, T>>.Current
{
get { return m_current; }
}
void IEnumerator.Reset()
{
throw new NotSupportedException("Reset() not supported");
}
// This method is the crux of this class. Under lock, it calls
// MoveNext() on the underlying enumerator, grabs Current and index,
// and increments the index.
bool IEnumerator.MoveNext()
{
bool rval = false;
lock (m_source)
{
rval = m_source.MoveNext();
if (rval)
{
m_current = new KeyValuePair<long, T>(m_index.Value, m_source.Current);
m_index.Value = m_index.Value + 1;
}
else m_current = default(KeyValuePair<long, T>);
}
return rval;
}
void IDisposable.Dispose()
{
if (!m_disposed)
{
// Delegate to parent enumerable's DisposeEnumerator() method
m_controllingEnumerable.DisposeEnumerator();
m_disposed = true;
}
}
}
// Constructor just grabs the collection to wrap
public SingleElementOrderablePartitioner(IEnumerable<T> enumerable)
: base(true, true, true)
{
// Verify that the source IEnumerable is not null
if (enumerable == null)
throw new ArgumentNullException("enumerable");
m_referenceEnumerable = enumerable;
}
// Produces a list of "numPartitions" IEnumerators that can each be
// used to traverse the underlying collection in a thread-safe manner.
// This will return a static number of enumerators, as opposed to
// GetOrderableDynamicPartitions(), the result of which can be used to produce
// any number of enumerators.
public override IList<IEnumerator<KeyValuePair<long, T>>> GetOrderablePartitions(int numPartitions)
{
if (numPartitions < 1)
throw new ArgumentOutOfRangeException("NumPartitions");
List<IEnumerator<KeyValuePair<long, T>>> list = new List<IEnumerator<KeyValuePair<long, T>>>(numPartitions);
// Since we are doing static partitioning, create an InternalEnumerable with reference
// counting of spawned InternalEnumerators turned on. Once all of the spawned enumerators
// are disposed, dynamicPartitions will be disposed.
var dynamicPartitions = new InternalEnumerable(m_referenceEnumerable.GetEnumerator(), true);
for (int i = 0; i < numPartitions; i++)
list.Add(dynamicPartitions.GetEnumerator());
return list;
}
// Returns an instance of our internal Enumerable class. GetEnumerator()
// can then be called on that (multiple times) to produce shared enumerators.
public override IEnumerable<KeyValuePair<long, T>> GetOrderableDynamicPartitions()
{
// Since we are doing dynamic partitioning, create an InternalEnumerable with reference
// counting of spawned InternalEnumerators turned off. This returned InternalEnumerable
// will need to be explicitly disposed.
return new InternalEnumerable(m_referenceEnumerable.GetEnumerator(), false);
}
// Must be set to true if GetDynamicPartitions() is supported.
public override bool SupportsDynamicPartitions
{
get { return true; }
}
}
Here are examples of how to structure Parallel.ForEach using the above OrderablePartitioner. See how you can refactor-out your finally-block entirely out of the ForEach impl?
public class Program
{
static void Main(string[] args)
{
//
// First a fairly simple visual test
//
var someCollection = new string[] { "four", "score", "and", "twenty", "years", "ago" };
var someOrderablePartitioner = new SingleElementOrderablePartitioner<string>(someCollection);
Parallel.ForEach(someOrderablePartitioner, (item, state, index) =>
{
Console.WriteLine("ForEach: item = {0}, index = {1}, thread id = {2}", item, index, Thread.CurrentThread.ManagedThreadId);
});
//
// Now a more rigorous test of dynamic partitioning (used by Parallel.ForEach)
//
List<int> src = Enumerable.Range(0, 100000).ToList();
SingleElementOrderablePartitioner<int> myOP = new SingleElementOrderablePartitioner<int>(src);
int counter = 0;
bool mismatch = false;
Parallel.ForEach(myOP, (item, state, index) =>
{
if (item != index) mismatch = true;
Interlocked.Increment(ref counter);
});
if (mismatch) Console.WriteLine("OrderablePartitioner Test: index mismatch detected");
Console.WriteLine("OrderablePartitioner test: counter = {0}, should be 100000", counter);
}
}
Also this link might be useful ("Write a simple parallel.ForEach Loop")

Ensure deferred execution will be executed only once or else

I ran into a weird issue and I'm wondering what I should do about it.
I have this class that return a IEnumerable<MyClass> and it is a deferred execution. Right now, there are two possible consumers. One of them sorts the result.
See the following example :
public class SomeClass
{
public IEnumerable<MyClass> GetMyStuff(Param givenParam)
{
double culmulativeSum = 0;
return myStuff.Where(...)
.OrderBy(...)
.TakeWhile( o =>
{
bool returnValue = culmulativeSum < givenParam.Maximum;
culmulativeSum += o.SomeNumericValue;
return returnValue;
};
}
}
Consumers call the deferred execution only once, but if they were to call it more than that, the result would be wrong as the culmulativeSum wouldn't be reset. I've found the issue by inadvertence with unit testing.
The easiest way for me to fix the issue would be to just add .ToArray() and get rid of the deferred execution at the cost of a little bit of overhead.
I could also add unit test in consumers class to ensure they call it only once, but that wouldn't prevent any new consumer coded in the future from this potential issue.
Another thing that came to my mind was to make subsequent execution throw.
Something like
return myStuff.Where(...)
.OrderBy(...)
.TakeWhile(...)
.ThrowIfExecutedMoreThan(1);
Obviously this doesn't exist.
Would it be a good idea to implement such thing and how would you do it?
Otherwise, if there is a big pink elephant that I don't see, pointing it out will be appreciated. (I feel there is one because this question is about a very basic scenario :| )
EDIT :
Here is a bad consumer usage example :
public class ConsumerClass
{
public void WhatEverMethod()
{
SomeClass some = new SomeClass();
var stuffs = some.GetMyStuff(param);
var nb = stuffs.Count(); //first deferred execution
var firstOne = stuff.First(); //second deferred execution with the culmulativeSum not reset
}
}
You can solve the incorrect result issue by simply turning your method into iterator:
double culmulativeSum = 0;
var query = myStuff.Where(...)
.OrderBy(...)
.TakeWhile(...);
foreach (var item in query) yield return item;
It can be encapsulated in a simple extension method:
public static class Iterators
{
public static IEnumerable<T> Lazy<T>(Func<IEnumerable<T>> source)
{
foreach (var item in source())
yield return item;
}
}
Then all you need to do in such scenarios is to surround the original method body with Iterators.Lazy call, e.g.:
return Iterators.Lazy(() =>
{
double culmulativeSum = 0;
return myStuff.Where(...)
.OrderBy(...)
.TakeWhile(...);
});
You can use the following class:
public class JustOnceOrElseEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerable<T> decorated;
public JustOnceOrElseEnumerable(IEnumerable<T> decorated)
{
this.decorated = decorated;
}
private bool CalledAlready;
public IEnumerator<T> GetEnumerator()
{
if (CalledAlready)
throw new Exception("Enumerated already");
CalledAlready = true;
return decorated.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
if (CalledAlready)
throw new Exception("Enumerated already");
CalledAlready = true;
return decorated.GetEnumerator();
}
}
to decorate an enumerable so that it can only be enumerated once. After that it would throw an exception.
You can use this class like this:
return new JustOnceOrElseEnumerable(
myStuff.Where(...)
...
);
Please note that I do not recommend this approach because it violates the contract of the IEnumerable interface and thus the Liskov Substitution Principle. It is legal for consumers of this contract to assume that they can enumerate the enumerable as many times as they like.
Instead, you can use a cached enumerable that caches the result of enumeration. This ensures that the enumerable is only enumerated once and that all subsequent enumeration attempts would read from the cache. See this answer here for more information.
Ivan's answer is very fitting for the underlying issue in OP's example - but for the general case, I have approached this in the past using an extension method similar to the one below. This ensures that the Enumerable has a single evaluation but is also deferred:
public static IMemoizedEnumerable<T> Memoize<T>(this IEnumerable<T> source)
{
return new MemoizedEnumerable<T>(source);
}
private class MemoizedEnumerable<T> : IMemoizedEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> _sourceEnumerator;
private readonly List<T> _cache = new List<T>();
public MemoizedEnumerable(IEnumerable<T> source)
{
_sourceEnumerator = source.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return IsMaterialized ? _cache.GetEnumerator() : Enumerate();
}
private IEnumerator<T> Enumerate()
{
foreach (var value in _cache)
{
yield return value;
}
while (_sourceEnumerator.MoveNext())
{
_cache.Add(_sourceEnumerator.Current);
yield return _sourceEnumerator.Current;
}
_sourceEnumerator.Dispose();
IsMaterialized = true;
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public List<T> Materialize()
{
if (IsMaterialized)
return _cache;
while (_sourceEnumerator.MoveNext())
{
_cache.Add(_sourceEnumerator.Current);
}
_sourceEnumerator.Dispose();
IsMaterialized = true;
return _cache;
}
public bool IsMaterialized { get; private set; }
void IDisposable.Dispose()
{
if(!IsMaterialized)
_sourceEnumerator.Dispose();
}
}
public interface IMemoizedEnumerable<T> : IEnumerable<T>
{
List<T> Materialize();
bool IsMaterialized { get; }
}
Example Usage:
void Consumer()
{
//var results = GetValuesComplex();
//var results = GetValuesComplex().ToList();
var results = GetValuesComplex().Memoize();
if(results.Any(i => i == 3))
{
Console.WriteLine("\nFirst Iteration");
//return; //Potential for early exit.
}
var last = results.Last(); // Causes multiple enumeration in naive case.
Console.WriteLine("\nSecond Iteration");
}
IEnumerable<int> GetValuesComplex()
{
for (int i = 0; i < 5; i++)
{
//... complex operations ...
Console.Write(i + ", ");
yield return i;
}
}
Naive: ✔ Deferred, ✘ Single enumeration.
ToList: ✘ Deferred, ✔ Single enumeration.
Memoize: ✔ Deferred, ✔ Single enumeration.
.
Edited to use the proper terminology and flesh out the implementation.

How do I set the Next property of a node in a LinkedList to another node in another LinkedList?

In C#, I am currently trying to link one node from a certain LinkedList to another node in another LinkedList. I am trying to make this for a game where tiles are connected in levels stacked on top of eachother. However, changing the Next property in the list gives an error. This is the code:
tileList2.First.Next = tileList1.First;
This is the error;
Property or indexer 'LinkedListNode.Next' cannot be assigned to -- it is read only."
How can I (otherwise) set the Next of this node to the First of the other node? What is causing this error? Is there any workaround?
Every node in the framework implementation of LinkedList has a reference to the list containing it, which makes transferring elements to another list O(n) instead of O(1), which defeats the purpose of the linked list implementation. If you want to transfer elements to another list in this implementation, you would have to remove and add them one by one (using Remove and AddAfter on the list) so that they each get a reference to the other list.
I suspect this is not what you're after, though. As other comments state, the needs of your list are probably simple enough that you'd be better off making your own much simpler linked list. And that question is already addressed elsewhere on SO (Creating a very simple linked list).
Since that answer doesn't include easy enumeration and initialization, I made my own.
class ListNode<T> : IEnumerable<T>
{
public T data;
public ListNode<T> Next;
private ListNode() { }
public ListNode(IEnumerable<T> init)
{
ListNode<T> current = null;
foreach(T elem in init)
{
if (current == null) current = this; else current = current.Next = new ListNode<T>();
current.data = elem;
}
}
class ListEnum : IEnumerator<T>
{
private ListNode<T> first;
private ListNode<T> current;
bool more;
public ListEnum(ListNode<T> first) { this.first = first; more = true; }
public T Current { get { return current.data; } }
public void Dispose(){}
object System.Collections.IEnumerator.Current { get { return current.data; } }
public void Reset() { current = null; more = true; }
public bool MoveNext()
{
if (!more)
return false;
else if (current == null)
return more = ((current = first) != null);
else
return more = ((current = current.Next) != null);
}
}
public IEnumerator<T> GetEnumerator()
{
return new ListEnum(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
static void Main(string[] args)
{
ListNode<int> l1 = new ListNode<int>(new int[] {3,1,4,1,5,9});
ListNode<int> l2 = new ListNode<int>(new int[] { 5 });
l2.Next = l1.Next;
foreach (int i in l2)
Console.WriteLine(i);
}
you can't set LinkedListNode.Next directly, it is readonly.
you also can't use LinkedList.AddLast because the LinkedlListNode you are trying to add is already in a list.
you actually need to break the lists and create new ones.
or you could implement your own linked list.

Confused about IEnumerator interface

I am trying to understand how to use the IEnumerator interface and what it is used for. I have a class which implements the IEnumerator interface. A string array is passed to the constructor method.
The problem is when I execute the code then the array is not listed properly. It should be doing it in the order "ali", "veli", "hatca" but it’s listed at the console in this order "veli", "hatca" and -1. I am so confused. What am I doing wrong here? Can you please help?
static void Main(string[] args)
{
ogr o = new ogr();
while (o.MoveNext())
{
Console.WriteLine(o.Current.ToString());
}
}
public class ogr: IEnumerator
{
ArrayList array_ = new ArrayList();
string[] names = new string[] {
"ali", "veli", "hatca"
};
public ogr()
{
array_.AddRange(names);
}
public void addOgr(string name)
{
array_.Add(name);
}
int position;
public object Current
{
get
{
if (position >= 0 && position < array_.Count)
{
return array_[position];
}
else
{
return -1;
}
}
}
public bool MoveNext()
{
if (position < array_.Count && position >= 0)
{
position++;
return true;
}
else
{
return false;
}
}
public void Reset()
{
position = 0;
}
}
IEnumerator is quite difficult to grasp at first, but luckily it's an interface you hardly ever use in itself. Instead, you should probably implement IEnumerable<T>.
However, the source of your confusion comes from this line from the IEnumerator documentation:
Initially, the enumerator is positioned before the first element in
the collection. The Reset method also brings the enumerator back to
this position. After an enumerator is created or the Reset method is
called, you must call the MoveNext method to advance the enumerator to
the first element of the collection before reading the value of
Current; otherwise, Current is undefined.
Your implementation has its current position at 0 initially, instead of -1, causing the strange behavior. Your enumerator begins with Current on the first element instead of being before it.
It is pretty rare for people to use that API directly. More commonly, it is simply used via the foreach statement, i.e.
foreach(var value in someEnumerable) { ... }
where someEnumerable implements IEnumerable, IEnumerable<T> or just the duck-typed pattern. Your class ogr certainly isn't an IEnumerator, and shouldn't be made to try to act like one.
If the intend is for ogr to be enumerable, then:
public ogr : IEnumerable {
IEnumerator IEnumerable.GetEnumerator() {
return array_.GetEnumerator();
}
}
I suspect it would be better to be IEnumerable<string>, though, using List<string> as the backing list:
public SomeType : IEnumerable<string> {
private readonly List<string> someField = new List<string>();
public IEnumerator<string> GetEnumerator()
{ return someField.GetEnumerator(); }
IEnumerator IEnumerable.GetEnumerator()
{ return someField.GetEnumerator(); }
}

Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?

Provided items is the result of a LINQ expression:
var items = from item in ItemsSource.RetrieveItems()
where ...
Suppose generation of each item takes some non-negligeble time.
Two modes of operation are possible:
Using foreach would allow to start working with items in the beginning of the collection much sooner than whose in the end become available. However if we wanted to later process the same collection again, we'll have to copy save it:
var storedItems = new List<Item>();
foreach(var item in items)
{
Process(item);
storedItems.Add(item);
}
// Later
foreach(var item in storedItems)
{
ProcessMore(item);
}
Because if we'd just made foreach(... in items) then ItemsSource.RetrieveItems() would get called again.
We could use .ToList() right upfront, but that would force us wait for the last item to be retrieved before we could start processing the first one.
Question: Is there an IEnumerable implementation that would iterate first time like regular LINQ query result, but would materialize in process so that second foreach would iterate over stored values?
A fun challenge so I have to provide my own solution. So fun in fact that my solution now is in version 3. Version 2 was a simplification I made based on feedback from Servy. I then realized that my solution had huge drawback. If the first enumeration of the cached enumerable didn't complete no caching would be done. Many LINQ extensions like First and Take will only enumerate enough of the enumerable to get the job done and I had to update to version 3 to make this work with caching.
The question is about subsequent enumerations of the enumerable which does not involve concurrent access. Nevertheless I have decided to make my solution thread safe. It adds some complexity and a bit of overhead but should allow the solution to be used in all scenarios.
public static class EnumerableExtensions {
public static IEnumerable<T> Cached<T>(this IEnumerable<T> source) {
if (source == null)
throw new ArgumentNullException("source");
return new CachedEnumerable<T>(source);
}
}
class CachedEnumerable<T> : IEnumerable<T> {
readonly Object gate = new Object();
readonly IEnumerable<T> source;
readonly List<T> cache = new List<T>();
IEnumerator<T> enumerator;
bool isCacheComplete;
public CachedEnumerable(IEnumerable<T> source) {
this.source = source;
}
public IEnumerator<T> GetEnumerator() {
lock (this.gate) {
if (this.isCacheComplete)
return this.cache.GetEnumerator();
if (this.enumerator == null)
this.enumerator = source.GetEnumerator();
}
return GetCacheBuildingEnumerator();
}
public IEnumerator<T> GetCacheBuildingEnumerator() {
var index = 0;
T item;
while (TryGetItem(index, out item)) {
yield return item;
index += 1;
}
}
bool TryGetItem(Int32 index, out T item) {
lock (this.gate) {
if (!IsItemInCache(index)) {
// The iteration may have completed while waiting for the lock.
if (this.isCacheComplete) {
item = default(T);
return false;
}
if (!this.enumerator.MoveNext()) {
item = default(T);
this.isCacheComplete = true;
this.enumerator.Dispose();
return false;
}
this.cache.Add(this.enumerator.Current);
}
item = this.cache[index];
return true;
}
}
bool IsItemInCache(Int32 index) {
return index < this.cache.Count;
}
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}
}
The extension is used like this (sequence is an IEnumerable<T>):
var cachedSequence = sequence.Cached();
// Pulling 2 items from the sequence.
foreach (var item in cachedSequence.Take(2))
// ...
// Pulling 2 items from the cache and the rest from the source.
foreach (var item in cachedSequence)
// ...
// Pulling all items from the cache.
foreach (var item in cachedSequence)
// ...
There is slight leak if only part of the enumerable is enumerated (e.g. cachedSequence.Take(2).ToList(). The enumerator that is used by ToList will be disposed but the underlying source enumerator is not disposed. This is because the first 2 items are cached and the source enumerator is kept alive should requests for subsequent items be made. In that case the source enumerator is only cleaned up when eligigble for garbage Collection (which will be the same time as the possibly large cache).
Take a look at the Reactive Extentsions library - there is a MemoizeAll() extension which will cache the items in your IEnumerable once they're accessed, and store them for future accesses.
See this blog post by Bart De Smet for a good read on MemoizeAll and other Rx methods.
Edit: This is actually found in the separate Interactive Extensions package now - available from NuGet or Microsoft Download.
public static IEnumerable<T> SingleEnumeration<T>(this IEnumerable<T> source)
{
return new SingleEnumerator<T>(source);
}
private class SingleEnumerator<T> : IEnumerable<T>
{
private CacheEntry<T> cacheEntry;
public SingleEnumerator(IEnumerable<T> sequence)
{
cacheEntry = new CacheEntry<T>(sequence.GetEnumerator());
}
public IEnumerator<T> GetEnumerator()
{
if (cacheEntry.FullyPopulated)
{
return cacheEntry.CachedValues.GetEnumerator();
}
else
{
return iterateSequence<T>(cacheEntry).GetEnumerator();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
private static IEnumerable<T> iterateSequence<T>(CacheEntry<T> entry)
{
using (var iterator = entry.CachedValues.GetEnumerator())
{
int i = 0;
while (entry.ensureItemAt(i) && iterator.MoveNext())
{
yield return iterator.Current;
i++;
}
}
}
private class CacheEntry<T>
{
public bool FullyPopulated { get; private set; }
public ConcurrentQueue<T> CachedValues { get; private set; }
private static object key = new object();
private IEnumerator<T> sequence;
public CacheEntry(IEnumerator<T> sequence)
{
this.sequence = sequence;
CachedValues = new ConcurrentQueue<T>();
}
/// <summary>
/// Ensure that the cache has an item a the provided index. If not, take an item from the
/// input sequence and move to the cache.
///
/// The method is thread safe.
/// </summary>
/// <returns>True if the cache already had enough items or
/// an item was moved to the cache,
/// false if there were no more items in the sequence.</returns>
public bool ensureItemAt(int index)
{
//if the cache already has the items we don't need to lock to know we
//can get it
if (index < CachedValues.Count)
return true;
//if we're done there's no race conditions hwere either
if (FullyPopulated)
return false;
lock (key)
{
//re-check the early-exit conditions in case they changed while we were
//waiting on the lock.
//we already have the cached item
if (index < CachedValues.Count)
return true;
//we don't have the cached item and there are no uncached items
if (FullyPopulated)
return false;
//we actually need to get the next item from the sequence.
if (sequence.MoveNext())
{
CachedValues.Enqueue(sequence.Current);
return true;
}
else
{
FullyPopulated = true;
return false;
}
}
}
}
So this has been edited (substantially) to support multithreaded access. Several threads can ask for items, and on an item by item basis, they will be cached. It doesn't need to wait for the entire sequence to be iterated for it to return cached values. Below is a sample program that demonstrates this:
private static IEnumerable<int> interestingIntGenertionMethod(int maxValue)
{
for (int i = 0; i < maxValue; i++)
{
Thread.Sleep(1000);
Console.WriteLine("actually generating value: {0}", i);
yield return i;
}
}
public static void Main(string[] args)
{
IEnumerable<int> sequence = interestingIntGenertionMethod(10)
.SingleEnumeration();
int numThreads = 3;
for (int i = 0; i < numThreads; i++)
{
int taskID = i;
Task.Factory.StartNew(() =>
{
foreach (int value in sequence)
{
Console.WriteLine("Task: {0} Value:{1}",
taskID, value);
}
});
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey(true);
}
You really need to see it run to understand the power here. As soon as a single thread forces the next actual values to be generated all of the remaining threads can immediately print that generated value, but they will all be waiting if there are no uncached values for that thread to print. (Obviously thread/threadpool scheduling may result in one task taking longer to print it's value than needed.)
There have already been posted thread-safe implementations of the Cached/SingleEnumeration operator by Martin Liversage and Servy respectively, and the thread-safe Memoise operator from the System.Interactive package is also available. In case thread-safety is not a requirement, and paying the cost of thread-synchronization is undesirable, there are answers offering unsynchronized ToCachedEnumerable implementations in this question. All these implementations have in common that they are based on custom types. My challenge was to write a similar not-synchronized operator in a single self-contained extension method (no strings attached). Here is my implementation:
public static IEnumerable<T> MemoiseNotSynchronized<T>(this IEnumerable<T> source)
{
// Argument validation omitted
IEnumerator<T> enumerator = null;
List<T> buffer = null;
return Implementation();
IEnumerable<T> Implementation()
{
if (buffer != null && enumerator == null)
{
// The source has been fully enumerated
foreach (var item in buffer) yield return item;
yield break;
}
enumerator ??= source.GetEnumerator();
buffer ??= new();
for (int i = 0; ; i = checked(i + 1))
{
if (i < buffer.Count)
{
yield return buffer[i];
}
else if (enumerator.MoveNext())
{
Debug.Assert(buffer.Count == i);
var current = enumerator.Current;
buffer.Add(current);
yield return current;
}
else
{
enumerator.Dispose(); enumerator = null;
yield break;
}
}
}
}
Usage example:
IEnumerable<Point> points = GetPointsFromDB().MemoiseNotSynchronized();
// Enumerate the 'points' any number of times, on a single thread.
// The data will be fetched from the DB only once.
// The connection with the DB will open when the 'points' is enumerated
// for the first time, partially or fully.
// The connection will stay open until the 'points' is enumerated fully
// for the first time.
Testing the MemoiseNotSynchronized operator on Fiddle.

Categories