I would like to implement a method, that takes a collection of an unknown Type as a parameter and returns a Collection of 2-tuples which contains all possible distinct combinations from these elements (with no repetition). My Code:
public static IEnumerable<Tuple<T, T>> Get2Combinations<T>(this
IEnumerable<T> col)
{
/*foreach (var item1 in col)
{
col.GetEnumerator().MoveNext();
foreach (var item2 in col)
{
yield return new Tuple<T, T>(item1, item2);
}
}*/
for (int i = 0; i < col.Count(); i++)
{
for (int j = i + 1; j < col.Count(); j++)
{
yield return new Tuple<T, T>(col.ElementAt(i),
col.ElementAt(j));
}
}
}
What I'm doing is i take the first element and take a pair with every other. Then using this inner for loop i loop through all the remaining ones. The problem i see is the method col.ElementAt(i). If we look into source code, we see that if 'col' is of type IList, then this gets directly the value at given index, but taking any other collection, this would be veery very slow and would take a lot of time.
I attempted to deal with this using foreach loops (the commented section), which are efficient when using IEnumerable, but that part just doesn't work, because the enumerator is common for both inner and outer loop and therefore this produces set of all 2-tuples, where some of them are repeated.
Would anyone give me some suggestions, how to improve this code?
The problem is that Enumerable is designed to describe a class where you can iterate through it (like a stream). Its not intended to support efficiently random access (like an array).
Where you use Count() you are forcing the Enumerable to iterate itself to its end, so in the case of a Stream this will wait until the entire stream is read. Of course a Stream might not support efficient direct access, or even buffer its content in memory (remember - it just promises to support enumeration) - so subsequently calling ElementAt() could force it to re-read from the beginning to the position indicated.
Best way to solve this is to swap from IEnumerable to IList. This means it does support random access; clearly it could still be poorly performing, but thats not the responsibility of your function.
Related
I am looking for an implementation of lazy shuffling in c#.
I only care about the time it takes to process the first couple of elements. I do not care whether or not the original list gets modified (i.e. removing elements would be fine). I do not care if the processing time gets longer as the iterator reaches the end of the list (as long as it stays within reasonable bounds of course).
Context: I have a large list, that I want to get a relatively small number of random samples from. In most cases I only need the very first random element, but in same rare cases I need all elements from the list.
If possible I would like to implement this as an extension method, like this (but answers without extension methods are fine too):
public static class Program
{
public static IEnumerable<T> lazy_shuffle<T>(this IEnumerable<T> input, Random r)
{
//do the magic
return input;
}
static void Main(string[] args)
{
var start = DateTime.Now;
var shuffled = Enumerable.Range(0, 1000000).lazy_shuffle(new Random(123));
var enumerate = shuffled.GetEnumerator();
foreach (var i in Enumerable.Range(0, 5))
{
enumerate.MoveNext();
Console.WriteLine(enumerate.Current);
}
Console.WriteLine($"time for shuffling 1000000 elements was {(DateTime.Now - start).TotalMilliseconds}ms");
}
}
Note:
input.OrderBy(i => r.Next()) would not be good enough, as it needs to iterate over the entire list once the generate one random number for each element of the list.
this is not a duplicate of Lazy Shuffle Algorithms because my question has less tight bounds for the algorithms but instead requires an implementation in c#
this is not a duplicate of Randomize a List<T> because that question is about regular shuffling and not lazy shuffling.
update:
A Count exists. Random Access to elements exists. It is not strictly an ienumerable, and instead just a big List or Array. I have update the question to say "list" instead of "ienumerable". Only the output of the lazy-shuffler needs to be enumerable, the source can be an actual list.
The selection should be fair, i.e. each element needs to have the same chance to be picked first.
mutation/modification of the source-list is fine
In the end I only need to take N random elements from the list, but I do not know the N beforehand
Since the original list can be modified, here is a very simple and efficient solution, based on this answer:
public static IEnumerable<T> Shuffle<T>(this IList<T> list, Random rng)
{
for(int i = list.Count - 1; i >= 0; i--)
{
int swapIndex = rng.Next(i + 1);
yield return list[swapIndex];
list[swapIndex] = list[i];
}
}
This example is for a method called "WriteLines", which takes an array of strings and adds them to an asynchronous file writer. It works, but I am curious if there is an interesting way to support -any- collection of strings, rather than relying on the programmer to convert to an array.
I came up with something like:
public void AddLines(IEnumerable<string> lines)
{
// grab the queue
lock (_queue)
{
// loop through the collection and enqueue each line
for (int i = 0, count = lines.Count(); i < count; i++)
{
_queue.Enqueue(lines.ElementAt(i));
}
}
// notify the thread it has work to do.
_hasNewItems.Set();
}
This appears to work but I have no idea of any performance implications it has, or any logic implications either (What happens to the order? I assume this will allow even unordered collections to work, e.g. HashSet).
Is there a more accepted way to achieve this?
You've been passed an IEnumerable<string> - that means you can iterate over it. Heck, there's even a language feature specifically for it - foreach:
foreach (string line in lines)
{
_queue.Enqueue(line);
}
Unlike your existing approach, this will only iterate over the sequence once. Your current code will behave differently based on the underlying implementation - in some cases Count() and ElementAt are optimized, but in some cases they aren't. You can see this really easily if you use an iterator block and log:
public IEnumerable<string> GetItems()
{
Console.WriteLine("yielding a");
yield return "a";
Console.WriteLine("yielding b");
yield return "b";
Console.WriteLine("yielding c");
yield return "c";
}
Try calling AddLines(GetItems()) with your current implementation, and look at the console...
Adding this answer as well since you are using threads, use a ConcurrentQueue instead, like so:
// the provider method
// _queue = new BlockingCollection<string>()
public void AddLines(IEnumerable<string> lines)
{
foreach (var line in lines)
{
_queue.Add(line);
}
}
No locks required, and allows for multiple consumers and providers since we flag for each element added.
The consumer basically only has to do var workitem = _queue.Take();
I'm getting an error and I don't know why:
static void decoupeTableau(IEnumerable<int> dst, IEnumerable<int> src)
{
for (int i = 0; i < src.Count() && i < 4; ++i)
dst.ElementAt(i) = src.ElementAt(i); // Here
}
Error:
The left-hand side of an assignment must be a variable, property or indexer
Why I'm getting it?
Why I'm getting it ?
Because you've got an assignment operator where the left hand side is a method call. What did you expect that to do? What code would that call? An assigment has to either set the value of a variable, or call a setter for a property. This isn't doing either of those things.
Basically, you can't do this - and for IEnumerable<T> it doesn't even make sense, as it can be read-only, generated etc.
Perhaps you want an IList<int> instead:
static void DecoupeTableau(IList<int> dst, IEnumerable<int> src)
{
int i = 0;
foreach (var value in src.Take(4))
{
dst[i] = value;
i++;
}
}
Note how this code is also potentially much more efficient - calling Count() and ElementAt in a loop can be very expensive. (For example, with a generator, each time you call Count() you have to iterate over the whole stream - so it wouldn't even complete if this were a theoretically infinite stream, such as a random sequence of integers.)
The thing with the IEnumerable is that you can loop thru the items but not change the list. If you need to actually change the list you need IList or array or similar.
The error you are getting is because ElemetAt is a function, not the element itself.
Also it is a particularly bad idea to loop thru IEnumerable with a for loop, you need a foreach loop instead.
This question already has answers here:
What is the best way to modify a list in a 'foreach' loop?
(11 answers)
Closed 9 years ago.
I want to do a foreach loop while taking out members of that foreach loop, but that's throwing errors. My only idea is to create another list inside of this loop to find which Slices to remove, and loop through the new list to remove items from Pizza.
foreach(var Slice in Pizza)
{
if(Slice.Flavor == "Sausage")
{
Me.Eat(Slice); //This removes an item from the list: "Pizza"
}
}
You can do this, by far the simplest way I have found (like to think I invented it, sure that's not true though ;))
foreach (var Slice in Pizza.ToArray())
{
if (Slice.Flavor == "Sausage") // each to their own.. would have gone for BBQ
{
Me.Eat(Slice);
}
}
..because it's iterating over a fixed copy of the loop. It will iterate all items, even if they are removed.
Handy isn't it!
(By the way guys, this is a handy way of iterating through a copy of a collection, with thread safety, and removing the time an object is locked: Lock, get the ToArray() copy, release the lock, then iterate)
Hope that helps!
If you have to iterate through a list and need to remove items, iterate backwards using a for loop:
// taken from Preet Sangha's answer and modified
for(int i = Pizza.Count-1; i >= 0, i--)
{
var Slice = Pizza[i];
if(Slice.Flavor == "Sausage")
{
Me.Eat(Slice); //This removes an item from the list: "Pizza"
}
}
The reason to iterate backwards is so that when you remove Elements you don't run into an IndexOutOfRangeException that's caused by accessing Pizza[5] on a Pizza that has only 5 Elements because we removed the sixth one.
The reason to use a for loop is because the iterator variable i has no relation to the Pizza, so you can modify the Pizza without the enumerator "breaking"
use a for loop not a foreach
for(int i = 0; i < in Pizza.Count(), ++i)
{
var Slice = Pizza[i];
if(Slice.Flavor == "Sausage")
{
Me.Eat(Slice); //This removes an item from the list: "Pizza"
}
}
Proably the clearest way to approach this would be to build a list of slices to eat, then to process it, avoiding altering the original enumeration in-loop. I've never been a fan of using indexed loops for this, as it can be error-prone.
List<Slice> slicesToEat=new List<Slice>();
foreach(var Slice in Pizza)
{
if(Slice.Flavor == "Sausage")
{
slicesToEat.Add(Slice);
}
}
foreach(var slice in slicesToEat)
{
Me.Eat(slice);
}
Perhaps change your Me.Eat() signature to take an IEnumerable<Slice>
Me.Eat(Pizza.Where(s=>s.Flavor=="Sausage").ToList());
This lets you perform the task in 1 line of code.
Then your Eat() could be like:
public void Eat(IEnumerable<Slice> remove)
{
foreach (Slice r in remove)
{
Pizza.Remove(r);
}
}
The VB6-styled "Collection" object allows for modification during enumeration, and seems to work sensibly when such modifications occur. Too bad it has other limitations (key type is limited to case-insensitive strings) and doesn't support generics, since none of the other collection types allow modification.
Frankly, I'm not clear why Microsoft's iEnumerable contract requires that an exception be thrown if a collection is modified. I would understand a requirement that an exception be thrown if a changes to a collection would make it impossible for an enumeration to continue without wackiness (skipping or duplicating values that did not change during enumeration, crashing, etc.) but see no reason not to allow a collection that could enumerate sensibly to do so.
Where can you order pizza where slices have separate toppings? Anyway...
Using Linq:
// Was "Me.Eat()" supposed to be "this.Eat()"?
Pizza
.Where(slice => slice.Flavor == "Sausage")
.Foreach(sausageSlice => { Me.Eat(sausageSlice); });
The first two lines create a new list with only sausage slices. The third will take that new subset and pass each slice to Me.Eat(). The { and ;} may be superfluous. This is not the most efficient method because it first makes a copy (as do many other approaches that have been given), but it is certainly clean and readable.
BTW, This is only for posterity as the best answer was already given - iterate backward by index.
What kind of collection is Pizza? If it's a List<T> then you can call the RemoveAll method:
Pizza.RemoveAll(slice => string.Equals(slice.Flavor, "Sausage"));
Can somebody provide a real life example regarding use of iterators. I tried searching google but was not satisfied with the answers.
You've probably heard of arrays and containers - objects that store a list of other objects.
But in order for an object to represent a list, it doesn't actually have to "store" the list. All it has to do is provide you with methods or properties that allow you to obtain the items of the list.
In the .NET framework, the interface IEnumerable is all an object has to support to be considered a "list" in that sense.
To simplify it a little (leaving out some historical baggage):
public interface IEnumerable<T>
{
IEnumerator<T> GetEnumerator();
}
So you can get an enumerator from it. That interface (again, simplifying slightly to remove distracting noise):
public interface IEnumerator<T>
{
bool MoveNext();
T Current { get; }
}
So to loop through a list, you'd do this:
var e = list.GetEnumerator();
while (e.MoveNext())
{
var item = e.Current;
// blah
}
This pattern is captured neatly by the foreach keyword:
foreach (var item in list)
// blah
But what about creating a new kind of list? Yes, we can just use List<T> and fill it up with items. But what if we want to discover the items "on the fly" as they are requested? There is an advantage to this, which is that the client can abandon the iteration after the first three items, and they don't have to "pay the cost" of generating the whole list.
To implement this kind of lazy list by hand would be troublesome. We would have to write two classes, one to represent the list by implementing IEnumerable<T>, and the other to represent an active enumeration operation by implementing IEnumerator<T>.
Iterator methods do all the hard work for us. We just write:
IEnumerable<int> GetNumbers(int stop)
{
for (int n = 0; n < stop; n++)
yield return n;
}
And the compiler converts this into two classes for us. Calling the method is equivalent to constructing an object of the class that represents the list.
Iterators are an abstraction that decouples the concept of position in a collection from the collection itself. The iterator is a separate object storing the necessary state to locate an item in the collection and move to the next item in the collection. I have seen collections that kept that state inside the collection (i.e. a current position), but it is often better to move that state to an external object. Among other things it enables you to have multiple iterators iterating the same collection.
Simple example : a function that generates a sequence of integers :
static IEnumerable<int> GetSequence(int fromValue, int toValue)
{
if (toValue >= fromValue)
{
for (int i = fromValue; i <= toValue; i++)
{
yield return i;
}
}
else
{
for (int i = fromValue; i >= toValue; i--)
{
yield return i;
}
}
}
To do it without an iterator, you would need to create an array then enumerate it...
Iterate through the students in a class
The Iterator design pattern provides
us with a common method of enumerating
a list of items or array, while hiding
the details of the list's
implementation. This provides a
cleaner use of the array object and
hides unneccessary information from
the client, ultimately leading to
better code-reuse, enhanced
maintainability, and fewer bugs. The
iterator pattern can enumerate the
list of items regardless of their
actual storage type.
Iterate through a set of homework questions.
But seriously, Iterators can provide a unified way to traverse the items in a collection regardless of the underlying data structure.
Read the first two paragraphs here for a little more info.
A couple of things they're great for:
a) For 'perceived performance' while maintaining code tidiness - the iteration of something separated from other processing logic.
b) When the number of items you're going to iterate through is not known.
Although both can be done through other means, with iterators the code can be made nicer and tidier as someone calling the iterator don't need to worry about how it finds the stuff to iterate through...
Real life example: enumerating directories and files, and finding the first [n] that fulfill some criteria, e.g. a file containing a certain string or sequence etc...
Beside everything else, to iterate through lazy-type sequences - IEnumerators. Each next element of such sequence may be evaluated/initialized upon iteration step which makes it possible to iterate through infinite sequences using finite amount of resources...
The canonical and simplest example is that it makes infinite sequences possible without the complexity of having to write the class to do that yourself:
// generate every prime number
public IEnumerator<int> GetPrimeEnumerator()
{
yield return 2;
var primes = new List<int>();
primesSoFar.Add(2);
Func<int, bool> IsPrime = n => primes.TakeWhile(
p => p <= (int)Math.Sqrt(n)).FirstOrDefault(p => n % p == 0) == 0;
for (int i = 3; true; i += 2)
{
if (IsPrime(i))
{
yield return i;
primes.Add(i);
}
}
}
Obviously this would not be truly infinite unless you used a BigInt instead of int but it gives you the idea.
Writing this code (or similar) for each generated sequence would be tedious and error prone. the iterators do that for you. If the above example seems too complex for you consider:
// generate every power of a number from start^0 to start^n
public IEnumerator<int> GetPowersEnumerator(int start)
{
yield return 1; // anything ^0 is 1
var x = start;
while(true)
{
yield return x;
x *= start;
}
}
They come at a cost though. Their lazy behaviour means you cannot spot common errors (null parameters and the like) until the generator is first consumed rather than created without writing wrapping functions to check first. The current implementation is also incredibly bad(1) if used recursively.
Wiriting enumerations over complex structures like trees and object graphs is much easier to write as the state maintenance is largely done for you, you must simply write code to visit each item and not worry about getting back to it.
I don't use this word lightly - a O(n) iteration can become O(N^2)
An iterator is an easy way of implementing the IEnumerator interface. Instead of making a class that has the methods and properties required for the interface, you just make a method that returns the values one by one and the compiler creates a class with the methods and properties needed to implement the interface.
If you for example have a large list of numbers, and you want to return a collection where each number is multiplied by two, you can make an iterator that returns the numbers instead of creating a copy of the list in memory:
public IEnumerable<int> GetDouble() {
foreach (int n in originalList) yield return n * 2;
}
In C# 3 you can do something quite similar using extension methods and lambda expressions:
originalList.Select(n => n * 2)
Or using LINQ:
from n in originalList select n * 2
IEnumerator<Question> myIterator = listOfStackOverFlowQuestions.GetEnumerator();
while (myIterator.MoveNext())
{
Question q;
q = myIterator.Current;
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}
foreach (Question q in listOfStackOverFlowQuestions)
{
if (q.Pertinent == true)
PublishQuestion(q);
else
SendMessage(q.Author.EmailAddress, "Your question has been rejected");
}