C# pointers, iterators and generics - c#

I am greatly stumped
How can I use an iterator in C# like a C++ iterator? I cannot find a Begin() or End() accessor, I cannot even find out how to declare an iterator. I have read about the Ienumerator. My goal is to implement the Merge Function. Here is part of my Merge function written in C++. Mostly, I am looking for the C# equivalent of what is shown, except I will be using a Reference type rather than integers.
void merge(vector<int>::iterator left, vector<int>::iterator right, vector<int>::iterator leftEnd, vector<int>::iterator rightEnd, vector<int>::iterator full)
{
while(left != leftEnd && right!= rightEnd) //compare left and right until the end of the vector is reached
{
if(*right < *left) //right < left so insert right to the output vector and advance the iterators
{
*full++ = *right++;
}
else //left < right so insert left to the output vector and advance the iterators
{
*full++ = *left++;
}
}
while(left != leftEnd) //copy any remaining elements into the output from left
{
*full++ = *left++;
}
}
Also, what collection(s) should I use? (currently I have been trying List<T> and LinkedList<T>).

It sounds like you want something like:
bool leftValid = left.MoveNext();
bool rightValid = right.MoveNext();
while (leftValid && rightValid)
{
if (right.Current < left.Current)
{
full.Add(right.Current);
rightValid = right.MoveNext();
}
else
{
full.Add(left.Current);
leftValid = left.MoveNext();
}
}
while (leftValid)
{
full.Add(left.Current);
leftValid = left.MoveNext();
}
while (rightValid)
{
full.Add(right.Current);
rightValid = right.MoveNext();
}
Here full would need to be some sort of IList<T> - .NET iterators don't let you make changes to the underlying collection.
You shouldn't try to write "bridging" code to let you use .NET iterators like C++ ones; it's much better to try to start thinking in terms of the .NET iterators when you're using .NET.
Note that it's quite rare to pass iterators around in .NET. It would be more natural to make your method to IEnumerable<T> parameters, and do something like:
using (IEnumerable<T> leftIterator = leftSequence.GetEnumerator())
{
using (IEnumerable<T> rightIterator = rightSequence.GetEnumerator())
{
// Code as above, just using leftIterator and rightIterator
// instead of left and right
}
}

.net containers don't support C++ style iterators. The only thing they have is a
simple forward iterator called IEnumerator<T>
which can't modify the collection
isn't random access
can't be copied (some collections have value type iterators which can be copied, but that's tricky business and rarely used)
and on most collections also gets invalidated whenever you modify the collection
Pretty much the only thing they can do is being iterated over in a foreach statement.
You might want to look into the IList<T> interface which allows random access, but is only supported on collections which support fast indexing. On such a collection you could implement in-place merge sort by using indices.
void Merge<T>(IList<T> container,int left, int right, int leftEnd, int rightEnd, int full)
and then use container[left] instead of *left.
Unfortunate consequence of this is, that you can't implement an efficient in-place container agnostic sorting function like C++ has.

I think you want GetEnumerator(), MoveNext(), and Current.
Normally, you can just use foreach to iterate, but your case is special.
If fact, rather than using "full", organize this as an iterator block and merge two enumerables lazily.
IEnumerable<T> Merge<T>(IEnumerable<T> left, IEnumerable<T> right)
{
... yield return Min<T>(left.Current, right.Current); ..,
}

You can use arrays, which have a fixed size, or List<T>, which are also called ArrayLists in other languages. Their items can be accessed through an indexer (list[i]) and items can be appended with list.Add(item);. They grow automatically. LinkedLists cannot be accessed via an indexer and must be traversed.
You would declare the method like this
void merge(IEnumerator<int> left, IEnumerator<int> right,
List<int> full)
{
// Jon Skeet's code goes here
}
You can get an enumerator like this
IEnumerable<int> intEnumerable = ...;
IEnumerator<int> intEnumerator = intEnumerable.GetEnumerator();
IEnumerable<T> is implemented by most generic collection types. Non generic collections usually implement IEnumerable.
(Edited in response to #CodeInChaos's comment).

Related

Looking for a data structure that is optimized for finding the next closest element

I have two classes, let's call them foo and bar, that both have a DateTime property called ReadingTime.
I then have long lists of these classes, let's say foos and bars, where foos is List<foo>, bars is List<bar>.
My goal is for every element in foos to find the events in bars that happened right before and right after foo.
Some code to clarify:
var foos = new List<foo>();
var bars = new List<bar>();
...
foreach (var foo in foos)
{
bar before = bars.Where(b => b.ReadingTime <= foo.ReadingTime).OrderByDescending(b => b.ReadingTime).FirstOrDefault();
bar after = bars.Where(b => b.ReadingTime > foo.ReadingTime).OrderBy(b => b.ReadingTime).FirstOrDefault();
...
}
My issue here is performance. Is it possible to use some other data structure than a list to speed up the comparisons? In particular the OrderBy statement every single time seems like a huge waste, having it pre-ordered should also speed up the comparisons, right?
I just don't know what data structure is best, SortedList, SortedSet, SortedDictionary etc. there seem so many. Also all the information I find is on lookups, inserts, delets, etc., noone writes about finding the next closest element so I'm not sure if anything is optimized for that.
I'm on .net core 3.1 if that matters.
Thanks in advance!
Edit: Okay so to wrap this up:
First I tried implementing #derloopkat's approach. For this I figured I needed a data type that could save the data in a sorted order so I just left it as IOrderedEnumerable (which is what linq returns). Probably not very smart, as that actually brought things to a crawl. I then tried going with SortedList. Had to remove some duplicates first which was no problem in my case. Thanks for the help #Olivier Rogier! This got me up to roughly 2x the original performance, though I suspect it's mostly the removed linq OrderBys. For now this is good enough, if/when I need more performance I'm going to go with what #CamiloTerevinto suggested.
Lastly #Aldert thank you for your time but I'm too noob and under too much time pressure to understand what you suggested. Still appreciate it and might revisit this later.
Edit2: Ended up going with #CamiloTerevinto's suggestion. Cut my runtime down from 10 hours to a couple of minutes.
You don't need to sort bars ascending and descending on each iteration. Order bars just once before the loop by calling .OrderBy(f => f.ReadingTime) and then use LastOrDefault() and FirstOrDefault().
foreach (var foo in foos)
{
bar before = bars.LastOrDefault(b => b.ReadingTime <= foo.ReadingTime);
bar after = bars.FirstOrDefault(b => b.ReadingTime > foo.ReadingTime);
//...
}
This produces same output you get with your code and runs faster.
For memory performances and to have strong typing, you can use a SortedDictionary, or SortedList but it manipulates objects. Because you compare DateTime you don't need to implement comparer.
What's the difference between SortedList and SortedDictionary?
SortedList<>, SortedDictionary<> and Dictionary<>
Difference between SortedList and SortedDictionary in C#
For speed optimization you can use a double linked list where each item indicates the next and the previous items:
Doubly Linked List in C#
Linked List Implementation in C#
Using a linked list or a double linked list requires more memory because you store the next and the previous reference in a cell that embeed each instance, but you can have sometimes the most faster way to parse and compare data, as well as to search, sort, reorder, add, remove and move items, because you don't manipulate an array, but linked references.
You also can create powerfull trees and manage data in a better way than arrays.
You can use the binary sort for quick lookup. Below the code where bars is sorted and foo is looked up. You can do yourself some reading on binary searches and enhance the code by also sorting Foos. In this case you can minimize the search range of bars...
The code generates 2 lists with 100 items. then sorts bars and does a binary search for 100 times.
using System;
using System.Collections.Generic;
namespace ConsoleApp2
{
class BaseReading
{
private DateTime readingTime;
public BaseReading(DateTime dt)
{
readingTime = dt;
}
public DateTime ReadingTime
{
get { return readingTime; }
set { readingTime = value; }
}
}
class Foo:BaseReading
{
public Foo(DateTime dt) : base(dt)
{ }
}
class Bar: BaseReading
{
public Bar(DateTime dt) : base(dt)
{ }
}
class ReadingTimeComparer: IComparer<BaseReading>
{
public int Compare(BaseReading x, BaseReading y)
{
return x.ReadingTime.CompareTo(y.ReadingTime);
}
}
class Program
{
static private List<BaseReading> foos = new List<BaseReading>();
static private List<BaseReading> bars = new List<BaseReading>();
static private Random ran = new Random();
static void Main(string[] args)
{
for (int i = 0; i< 100;i++)
{
foos.Add(new BaseReading(GetRandomDate()));
bars.Add(new BaseReading(GetRandomDate()));
}
var rtc = new ReadingTimeComparer();
bars.Sort(rtc);
foreach (BaseReading br in foos)
{
int index = bars.BinarySearch(br, rtc);
}
}
static DateTime GetRandomDate()
{
long randomTicks = ran.Next((int)(DateTime.MaxValue.Ticks >> 32));
randomTicks = (randomTicks << 32) + ran.Next();
return new DateTime(randomTicks);
}
}
}
The only APIs available in the .NET platform for finding the next closest element, with a computational complexity better than O(N), are the List.BinarySearch and Array.BinarySearch methods:
// Returns the zero-based index of item in the sorted List<T>, if item is found;
// otherwise, a negative number that is the bitwise complement of the index of
// the next element that is larger than item or, if there is no larger element,
// the bitwise complement of Count.
public int BinarySearch (T item, IComparer<T> comparer);
These APIs are not 100% robust, because the correctness of the results depends on whether the underlying data structure is already sorted, and the platform does not check or enforce this condition. It's up to you to ensure that the list or array is sorted with the correct comparer, before attempting to BinarySearch on it.
These APIs are also cumbersome to use, because in case a direct match is not found you'll get the next largest element as a bitwise complement, which is a negative number, and you'll have to use the ~ operator to get the actual index. And then subtract one to get the closest item from the other direction.
If you don't mind adding a third-party dependency to your app, you could consider the C5 library, which contains the TreeDictionary collection, with the interesting methods below:
// Find the entry in the dictionary whose key is the predecessor of the specified key.
public bool TryPredecessor(K key, out SCG.KeyValuePair<K, V> res);
//Find the entry in the dictionary whose key is the successor of the specified key.
public bool TrySuccessor(K key, out SCG.KeyValuePair<K, V> res)
There are also the TryWeakPredecessor and TryWeakSuccessor methods available, that consider an exact match as a predecessor or successor respectively. In other words they are analogous to the <= and >= operators.
The C5 is a powerful and feature-rich library that offers lots of specialized collections, with its cons being its somewhat idiomatic API.
You should get excellent performance by any of these options.

Finding 2-Tuple Combinations of IEnumerable<T> collection, C#

I would like to implement a method, that takes a collection of an unknown Type as a parameter and returns a Collection of 2-tuples which contains all possible distinct combinations from these elements (with no repetition). My Code:
public static IEnumerable<Tuple<T, T>> Get2Combinations<T>(this
IEnumerable<T> col)
{
/*foreach (var item1 in col)
{
col.GetEnumerator().MoveNext();
foreach (var item2 in col)
{
yield return new Tuple<T, T>(item1, item2);
}
}*/
for (int i = 0; i < col.Count(); i++)
{
for (int j = i + 1; j < col.Count(); j++)
{
yield return new Tuple<T, T>(col.ElementAt(i),
col.ElementAt(j));
}
}
}
What I'm doing is i take the first element and take a pair with every other. Then using this inner for loop i loop through all the remaining ones. The problem i see is the method col.ElementAt(i). If we look into source code, we see that if 'col' is of type IList, then this gets directly the value at given index, but taking any other collection, this would be veery very slow and would take a lot of time.
I attempted to deal with this using foreach loops (the commented section), which are efficient when using IEnumerable, but that part just doesn't work, because the enumerator is common for both inner and outer loop and therefore this produces set of all 2-tuples, where some of them are repeated.
Would anyone give me some suggestions, how to improve this code?
The problem is that Enumerable is designed to describe a class where you can iterate through it (like a stream). Its not intended to support efficiently random access (like an array).
Where you use Count() you are forcing the Enumerable to iterate itself to its end, so in the case of a Stream this will wait until the entire stream is read. Of course a Stream might not support efficient direct access, or even buffer its content in memory (remember - it just promises to support enumeration) - so subsequently calling ElementAt() could force it to re-read from the beginning to the position indicated.
Best way to solve this is to swap from IEnumerable to IList. This means it does support random access; clearly it could still be poorly performing, but thats not the responsibility of your function.

Is an infinite enumerable still "enumerable"?

Like two overlapping line segments, we can find infinite points of intersection. To enumerate all these points might not make sense, and we might just want to present that this collection is infinity.
Floating point numbers have defined NegativeInfinity and PositiveInfinity. A number which represents count or ordinal seem not necessary to use floating point numbers, however, integers are not defined something to represent infinity.
So I tried to implement an infinite enumerable. But I suddenly get confused with the term "enumerable" ..
Is there a better way to solve this problem? And is an infinite enumerable still enumerable?
Code
public partial class Infinity: IEnumerable<object> {
IEnumerator<object> IEnumerable<object>.GetEnumerator() {
for(; ; )
yield return Infinity.Enumerable;
}
public IEnumerator GetEnumerator() {
for(; ; )
yield return Infinity.Enumerable;
}
public Infinity LongCount(
Func<object, bool> predicate=default(Func<object, bool>)) {
return Infinity.Enumerable;
}
public Infinity Count(
Func<object, bool> predicate=default(Func<object, bool>)) {
return Infinity.Enumerable;
}
public static readonly Infinity Enumerable=new Infinity();
}
Edit:
Thanks for answering. I'm not confused with IEnumerable and IEnumerator. GetEnumerator methods return Infinity.Enumerable is because I do not want to declare an extra dummy object such as:
static readonly object dummy=new object();
and yield return dummy in GetEnumerator methods.
And is an infinite enumerable still enumerable?
Enumerable, in this sense, is based off the second definition of enumerate:
to specify one after another
It is not referring to the (more common outside of computing) definition whereby it effectively means "able to be counted."
In this sense, an infinite series can definitely be listed one item after another, and qualifies as an enumerable.
That being said, I don't see the purpose behind your code in this example. Infinite enumerables are typically representing something like a stream of data without an end, or other sources where there is no "end", but the potential to continually pull information.
The positive natural numbers are infinite and clearly enumerable (1, 2, 3, …). The concept is well-defined even outside of C#.
Your class however has problems because you are confusing the IEnumerable and IEnumerator interface. The GetEnumerator method returns only one enumerator. That is infinite.
An easy implementation of an infinite IEnumerable in C# (as a method rather than a class) looks as follows:
IEnumerable<int> Infinite() {
int i = 1;
while (true)
yield return i++;
}
Caveat: int overflows at some point. However, by default C# will then simply loop back to negative numbers.

Java equivalent to IEnumerator from C#?

Are there interfaces in Java library, which also enumerates some series, but follows slightly another logic than Iterator and Enumeration? Namely they should return boolean on next(), saying whether next element was reached, and should have getCurrent() method, which returns current element?
UPDATE
Iterator.next() is not equivalent of IEnumerator.Current since former will advance iterator on each call while latter won't.
UPDATE 2
I am designing my own class with my own functionality. My question was in order to find a "competent" analogy. The sample from C# was just a sample, I am not translating something from C# to Java.
This sounds like Guava's PeekingIterator; you can decorate a plain Iterator with Iterators.peekingIterator.
You have to use a different approach in Java.
e.g., instead of this C# code:
Dictionary<int?, int?> m = new Dictionary<int?, int?>();
for (IEnumerator<KeyValuePair<int?, int?>> it = m.GetEnumerator(); it.MoveNext();)
{
Console.Write(it.Current.Key);
Console.Write(it.Current.Value);
}
You will need to use:
java.util.HashMap<Integer, Integer> m = new java.util.HashMap<Integer, Integer>();
for (java.util.Iterator<java.util.Map.Entry<Integer, Integer>> it = m.entrySet().iterator(); it.hasNext();)
{
java.util.Map.Entry<Integer, Integer> current = it.next();
System.out.print(current.getKey());
System.out.print(current.getValue());
}
There should not be a high demand for this particular conversion since you would normally use a 'foreach' loop in C#, which would convert more cleanly to Java.
If you use a standard concrete Collection class, such as HashSet and ArrayList to name but two, you will have access to an iterator.
Calling the method: collection.hasNext() will return a boolean, but not advance the pointer. This will allow you to determine whether you should attempt to read collection.next().
Example:
Set<String> numbers = new HashSet<>();
// fill in set...
while (numbers.hasNext()) {
System.out.println(numbers.next());
}
Of course, you can also iterate through a collection using the for-each syntax:
for (String s : numbers) {
System.out.println(s)
}

Why do we need iterators in c#?

Can somebody provide a real life example regarding use of iterators. I tried searching google but was not satisfied with the answers.
You've probably heard of arrays and containers - objects that store a list of other objects.
But in order for an object to represent a list, it doesn't actually have to "store" the list. All it has to do is provide you with methods or properties that allow you to obtain the items of the list.
In the .NET framework, the interface IEnumerable is all an object has to support to be considered a "list" in that sense.
To simplify it a little (leaving out some historical baggage):
public interface IEnumerable<T>
{
IEnumerator<T> GetEnumerator();
}
So you can get an enumerator from it. That interface (again, simplifying slightly to remove distracting noise):
public interface IEnumerator<T>
{
bool MoveNext();
T Current { get; }
}
So to loop through a list, you'd do this:
var e = list.GetEnumerator();
while (e.MoveNext())
{
var item = e.Current;
// blah
}
This pattern is captured neatly by the foreach keyword:
foreach (var item in list)
// blah
But what about creating a new kind of list? Yes, we can just use List<T> and fill it up with items. But what if we want to discover the items "on the fly" as they are requested? There is an advantage to this, which is that the client can abandon the iteration after the first three items, and they don't have to "pay the cost" of generating the whole list.
To implement this kind of lazy list by hand would be troublesome. We would have to write two classes, one to represent the list by implementing IEnumerable<T>, and the other to represent an active enumeration operation by implementing IEnumerator<T>.
Iterator methods do all the hard work for us. We just write:
IEnumerable<int> GetNumbers(int stop)
{
for (int n = 0; n < stop; n++)
yield return n;
}
And the compiler converts this into two classes for us. Calling the method is equivalent to constructing an object of the class that represents the list.
Iterators are an abstraction that decouples the concept of position in a collection from the collection itself. The iterator is a separate object storing the necessary state to locate an item in the collection and move to the next item in the collection. I have seen collections that kept that state inside the collection (i.e. a current position), but it is often better to move that state to an external object. Among other things it enables you to have multiple iterators iterating the same collection.
Simple example : a function that generates a sequence of integers :
static IEnumerable<int> GetSequence(int fromValue, int toValue)
{
if (toValue >= fromValue)
{
for (int i = fromValue; i <= toValue; i++)
{
yield return i;
}
}
else
{
for (int i = fromValue; i >= toValue; i--)
{
yield return i;
}
}
}
To do it without an iterator, you would need to create an array then enumerate it...
Iterate through the students in a class
The Iterator design pattern provides
us with a common method of enumerating
a list of items or array, while hiding
the details of the list's
implementation. This provides a
cleaner use of the array object and
hides unneccessary information from
the client, ultimately leading to
better code-reuse, enhanced
maintainability, and fewer bugs. The
iterator pattern can enumerate the
list of items regardless of their
actual storage type.
Iterate through a set of homework questions.
But seriously, Iterators can provide a unified way to traverse the items in a collection regardless of the underlying data structure.
Read the first two paragraphs here for a little more info.
A couple of things they're great for:
a) For 'perceived performance' while maintaining code tidiness - the iteration of something separated from other processing logic.
b) When the number of items you're going to iterate through is not known.
Although both can be done through other means, with iterators the code can be made nicer and tidier as someone calling the iterator don't need to worry about how it finds the stuff to iterate through...
Real life example: enumerating directories and files, and finding the first [n] that fulfill some criteria, e.g. a file containing a certain string or sequence etc...
Beside everything else, to iterate through lazy-type sequences - IEnumerators. Each next element of such sequence may be evaluated/initialized upon iteration step which makes it possible to iterate through infinite sequences using finite amount of resources...
The canonical and simplest example is that it makes infinite sequences possible without the complexity of having to write the class to do that yourself:
// generate every prime number
public IEnumerator<int> GetPrimeEnumerator()
{
yield return 2;
var primes = new List<int>();
primesSoFar.Add(2);
Func<int, bool> IsPrime = n => primes.TakeWhile(
p => p <= (int)Math.Sqrt(n)).FirstOrDefault(p => n % p == 0) == 0;
for (int i = 3; true; i += 2)
{
if (IsPrime(i))
{
yield return i;
primes.Add(i);
}
}
}
Obviously this would not be truly infinite unless you used a BigInt instead of int but it gives you the idea.
Writing this code (or similar) for each generated sequence would be tedious and error prone. the iterators do that for you. If the above example seems too complex for you consider:
// generate every power of a number from start^0 to start^n
public IEnumerator<int> GetPowersEnumerator(int start)
{
yield return 1; // anything ^0 is 1
var x = start;
while(true)
{
yield return x;
x *= start;
}
}
They come at a cost though. Their lazy behaviour means you cannot spot common errors (null parameters and the like) until the generator is first consumed rather than created without writing wrapping functions to check first. The current implementation is also incredibly bad(1) if used recursively.
Wiriting enumerations over complex structures like trees and object graphs is much easier to write as the state maintenance is largely done for you, you must simply write code to visit each item and not worry about getting back to it.
I don't use this word lightly - a O(n) iteration can become O(N^2)
An iterator is an easy way of implementing the IEnumerator interface. Instead of making a class that has the methods and properties required for the interface, you just make a method that returns the values one by one and the compiler creates a class with the methods and properties needed to implement the interface.
If you for example have a large list of numbers, and you want to return a collection where each number is multiplied by two, you can make an iterator that returns the numbers instead of creating a copy of the list in memory:
public IEnumerable<int> GetDouble() {
foreach (int n in originalList) yield return n * 2;
}
In C# 3 you can do something quite similar using extension methods and lambda expressions:
originalList.Select(n => n * 2)
Or using LINQ:
from n in originalList select n * 2
IEnumerator<Question> myIterator = listOfStackOverFlowQuestions.GetEnumerator();
while (myIterator.MoveNext())
{
Question q;
q = myIterator.Current;
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}
foreach (Question q in listOfStackOverFlowQuestions)
{
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}

Categories