Should I use Yield when writing my own extension?

Should I use Yield when writing my own extension? - c#

I wanted to write an extension method (for using it in a fluent syntax) so that If a sequence is :
List< int> lst = new List< int>(){1,2,3 };
I want to repeat it 3 times (for example). so the output would be 123123123
I wrote this :
public static IEnumerable<TSource> MyRepeat<TSource>(this IEnumerable<TSource> source,int n)
{
return Enumerable.Repeat(source,n).SelectMany(f=>f);
}
And now I can do this :
lst.MyRepeat(3)
output :
Question :
Shouldn't I use Yield in the extension method ? I tried yield return but it's not working here. Why is that and should I use it.
edit
After Ant's answer I changed it to :
public static IEnumerable<TSource> MyRepeat<TSource>(this IEnumerable<TSource> source,int n)
{
var k=Enumerable.Repeat(source,n).SelectMany(f=>f);
foreach (var element in k)
{
yield return element;
}
}
But is there any difference ?

This is because the following already returns an IEnumerable:
Enumerable.Repeat(source,n).SelectMany(f=>f);
When you use the yield keyword, you specify that a given iteration over the method will return what follows. So you are essentially saying "each iteration will yield an IEnumerable<TSource>," when actually, each iteration over a method returning an IEnumerable<TSource>should yield a TSource.
Hence, your error - when you iterate over MyRepeat, you are expected to return a TSource but because you are trying to yield an IEnumerable, you are actually trying to return an IEnumerable from every iteration instead of returning a single element.
Your edit should work but is a little pointless - if you simply return the IEnumerable directly it won't be enumerated until you iterate over it (or call ToList or something). In your very first example, SelectMany (or one of its nested methods) will already be using yield, meaning the yield is already there, it's just implicit in your method.

Ant P's answer is of course correct.
You would use yield if you were building the enumerable that is returned yourself, rather than relying on SelectMany. eg:
public static IEnumerable<T> Repeat<T>(this IEnumberable<T> items, int repeat)
{
for (int i = 0; i < repeat; ++i)
foreach(T item in items)
yield return item;
}
The thing you yield is an element of the sequence. The code is instructions for producing the sequence of yielded elements.

Related

Why is DefaultIfEmpty implemented this way?

Chasing the implementation of System.Linq.Enumerable.DefaultIfEmpty took me to this method. It looks alright except for the following quaint details:
// System.Linq.Enumerable
[IteratorStateMachine(typeof(Enumerable.<DefaultIfEmptyIterator>d__90<>))]
private static IEnumerable<TSource> DefaultIfEmptyIterator<TSource>(IEnumerable<TSource> source, TSource defaultValue)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
do
{
yield return enumerator.Current;
}
while (enumerator.MoveNext());
}
else
{
yield return defaultValue;
}
}
IEnumerator<TSource> enumerator = null;
yield break;
yield break;
}
1) Why does the code have to iterate over the whole sequence once it has been established that the sequence is not empty?
2) Why the yield break two times at the end?
3) Why explicitly set the enumerator to null at the end when there is no other reference to it?
I would have left it at this:
// System.Linq.Enumerable
[IteratorStateMachine(typeof(Enumerable.<DefaultIfEmptyIterator>d__90<>))]
private static IEnumerable<TSource> DefaultIfEmptyIterator<TSource>(IEnumerable<TSource> source, TSource defaultValue)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
do
{
yield return enumerator.Current;
}
// while (enumerator.MoveNext());
}
else
{
yield return defaultValue;
}
}
// IEnumerator<TSource> enumerator = null;
yield break;
// yield break;
}

DefaultIfEmpty needs to act as the following:
If the source enumerable has no entries, it needs to act as an enumerable with a single value; the default value.
If the source enumerable is not empty, it needs to act as the source enumerable. Therefore, it needs to yield all values.

Because when you start enumerating and this code is used as another level of enumeration you have to enumerate the whole thing.
If you just yield return the first one and stop there the code using this enumerator will think there is only one value. So you have to enumerate everything there is and yield return it forward.
You could of course do return enumerator and that would work, but not after the MoveNext() has been called since that would cause the first value to be skipped. If there was another way to check if values exist then this would be the way to do it.

Why does the code have to iterate over the whole sequence once it has been established that the sequence is not empty?
As you can read in MSDN about DefaultIfEmtpy return value:
An IEnumerable<T> object that contains the default value for the TSource type if source is empty; otherwise, source.
So, if the enumerable is empty the result is a enumerable containing the default value, but if the enumerable isn't empty the same enumerable is returned (not only the first element).
It may seem that this method is about checking only whether an enumerable contains elements or not, but it is not the case.
Why the yield break two times at the end?
No ideas :)

How to get excluded collection without a second LINQ query?

I have a LINQ query that looks like this:
var p = option.GetType().GetProperties().Where(t => t.PropertyType == typeof(bool));
What is the most efficient way to get the items which aren't included in this query, without executing a second iteration over the list.
I could easily do this with a for loop but I was wondering if there's a shorthand with LINQ.

var p = option.GetType().GetProperties().ToLookup(t => t.PropertyType == typeof(bool));
var bools = p[true];
var notBools = p[false];
.ToLookup() is used to partition an IEnumerable based on a key function. In this case, it will return an Lookup which will have at most 2 items in it. Items in the Lookup can be accessed using a key similar to an IDictionary.
.ToLookup() is evaluated immediately and is an O(n) operation and accessing a partition in the resulting Lookup is an O(1) operation.
Lookup is very similar to a Dictionary and have similar generic parameters (a Key type and a Value type). However, where Dictionary maps a key to a single value, Lookup maps a key to an set of values. Lookup can be implemented as IDictionary<TKey, IEnumerable<TValue>>
.GroupBy() could also be used. But it is different from .ToLookup() in that GroupBy is lazy evaluated and could possibly be enumerated multiple times. .ToLookup() is evaluated immediately and the work is only done once.

You cannot get something that you don't ask for. So if you exlude all but bool you can't expect to get them later. You need to ask for them.
For what it's worth, if you need both, the one you want and all other in a single query you could GroupBy this condition or use ToLookup which i would prefer:
var isboolOrNotLookup = option.GetType().GetProperties()
.ToLookup(t => t.PropertyType == typeof(bool)); // use PropertyType instead
Now you can use this lookup for further processing. For example, if you want a collection of all properties which are bool:
List<System.Reflection.PropertyInfo> boolTypes = isboolOrNotLookup[true].ToList();
or just the count:
int boolCount = isboolOrNotLookup[true].Count();
So if you want to process all which are not bool:
foreach(System.Reflection.PropertyInfo prop in isboolOrNotLookup[false])
{
}

Well, you could go for source.Except(p), but it would reiterate the list and perform a lot of comparisons.
I'd say - write an extension method that does it using foreach, basically splitting the list into two destinations. Or something like this.
How about:
public class UnzipResult<T>{
private readonly IEnumearator<T> _enumerator;
private readonly Func<T, bool> _filter;
private readonly Queue<T> _nonMatching = new Queue<T>();
private readonly Queue<T> _matching = new Queue<T>();
public IEnumerable<T> Matching {get{
if(_matching.Count > 0)
yield return _matching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_nonMatching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public IEnumerable<T> Rest {get{
if(_matching.Count > 0)
yield return _nonMatching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(!_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_matching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public UnzipResult(IEnumerable<T> source, Func<T, bool> filter){
_enumerator = source.GetEnumerator();
_filter = filter;
}
}
public static UnzipResult<T> Unzip(this IEnumerable<T> source, Func<T,bool> filter){
return new UnzipResult(source, filter);
}
It's written in notepad, so probably doesn't compile, but my idea is: whatever collection you enumerate (matching or non-matching), you only enumerate the source once. And it should work fairly well with those pesky infinite collections (think yield return random.Next()), unless all elements do/don't fulfil filter.

ICollection<T> is non-index based, but TakeWhile() exists

I'm trying to replace usages of T[] or List<T> as function parameters and return values with more appropriate types such as IEnumerable<T>, ICollection<T> and IList<T>.
ICollection<T> from my understanding is preferrable to IList<T> where you are only needing basic/simple collection functionality (eg an enumerator and count functionality) as it provides this with the least restriction. From reading on here one of the main differentiators I thought was that ICollection<T> doesn't require that the underlying collection to be index based where IList<T> does?
In switching my List<T> references over I needed to replace a List<T>.GetRange() call and I was very surprised to find the ICollection<T>.TakeWhile() extension method which has an overload supporting selection based on index?! (msdn link)
I'm confused why this method exists on ICollection where there is nothing index based on this interface? Have I misunderstood or how can this method actually work if the underlying collection is eg a Hashset or something?

The method, like most of LINQ, is on IEnumerable<T>. Any features that just pass the indexer to the consumer (such as TakeWhile) only need to loop while incrementing a counter. Some APIs may be able to optimize using an indexer, and then it is up to them to decide whether to do that, or just use IEnumerable<T> and simply skip (etc) unwanted data.
For example:
int i = 0;
foreach(var item in source) {
if(!predicate(i++, item)) break;
yield return item;
}

Indexing can be done without collection's support of it
int i = -1;
foreach(var item in collection)
{
i++;
// item is at index i;
}

TakeWhile and other extension methods from System.Linq.Enumerable class work on all the types implementing IEnumerable<T>. They all iterate over the collection (using foreach statement) and perform appropriate actions.
Here is the implementation of the TakeWhile method, with some simplifications:
private static IEnumerable<TSource> TakeWhile<TSource>(IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (!predicate(item))
{
break;
}
yield return item;
}
}
As you see, it simply iterates over the collection, and evaluates the predicate. This is true for almost all other LINQ methods. The same will happen when you use any other collection, like HashSet<T>.

How to create an extension method to handle bindinglist.removeall with predicate input

myGenericList.RemoveAll(x => (x.StudentName == "bad student"));
Works great, but a bindinglist does not have this method. How can I create an extension method for the bindinglist that takes as input a predicate and does the magic like the canned removeall for List
thankyou

Like I said in a comment, there is no magic in extension methods, just write the code the same way as if you wrote it normally, just put it in a static method in a static class and use the this keyword:
public static void RemoveAll<T>(this BindingList<T> list, Func<T, bool> predicate)
{
foreach (var item in list.Where(predicate).ToArray())
list.Remove(item);
}
You have to use ToArray() (or ToList()), because Where() is lazy and only enumerates the collection when needed and you can't enumerate changing collection.
Although this solution is quite slow (O(N2)), because every Remove() has to look through the collection to find the correct item to remove. We can do better:
public static void FastRemoveAll<T>(this BindingList<T> list, Func<T, bool> predicate)
{
for (int i = list.Count - 1; i >= 0; i--)
if (predicate(list[i]))
list.RemoveAt(i);
}
This uses the fact that we can get to i-th item in constant time, so the whole method is O(N). The iteration is easier to write backwards, so that indexes of items we have yet to consider aren't changing.
EDIT: Actually the second solution is still O(N2), because every RemoveAt() has to move all the items after the one that was removed.

I'd say:
public static class BindingListExtensions
{
public static void RemoveAll<T>(this BindingList<T> list, Func<T, bool> predicate)
{
// first check predicates -- uses System.Linq
// could collapse into the foreach, but still must use
// ToList() or ToArray() to avoid deferred execution
var toRemove = list.Where(predicate).ToList();
// then loop and remove after
foreach (var item in toRemove)
{
list.Remove(item);
}
}
}
And for those interested in the minutia, seems ToList() and ToArray() are so close to the same performance (and in fact each can be faster based on the circumstance) as to be negligible: I need to iterate and count. What is fastest or preferred: ToArray() or ToList()?

When to use Yield?

When should I use return yield and when should I use return only?

Use yield when you are returning an enumerable, and you don't have all the results at that point.
Practically, I've used yield when I want to iterate through a large block of information (database, flat file, etc.), and I don't want to load everything in memory first. Yield is a nice way to iterate through the block without loading everything at once.

The yield keyword is incredibly powerful. It basically allows you to quickly return IEnumerable and IEnumerator objects without explicitly coding them.
Consider a scenario where you want to return the intersection of two IEnumerable objects. Here is how you would do it using the yield keyword.
public static class Program
{
public static void Main()
{
IEnumerable<object> lhs = new List<int> { 1, 2, 3, 4, 5 };
IEnumerable<object> rhs = new List<int> { 3, 4, 5, 6, 7 };
foreach (object item in IntersectExample.Intersect(lhs, rhs))
{
Console.WriteLine(item);
break;
}
}
}
public static class IntersectExample
{
public static IEnumerable<object> Intersect(IEnumerable<object> lhs, IEnumerable<object> rhs)
{
var hashset = new HashSet<object>();
foreach (object item in lhs)
{
if (!hashset.Contains(item))
{
hashset.Add(item);
}
}
foreach (object item in rhs)
{
if (hashset.Contains(item))
{
yield return item;
}
}
}
}
It is hard to appreciate this until you fully realize what is going on. Normally when you intersect two sets you complete the entire operation before returning the result to the caller. The means the runtime complexity of the operation is O(m + n), where m and n are the sizes of the collections being intersected, regardless of what you do with the result afterwards. But, in my example I just wanted to pick off the first item from the result. Using an IEnumerable that was created by the yield keyword makes it super easy to delay part of the processing until it is actually required. My example runs in O(m). The alternative is to code the IEnumerable and maintain the state in it manually. The power of the yield keyword is that it creates that state machine for you.

Yield is for iterators.
It lets you process a list in small swallows, which is nice for big lists.
The magical thing about Yield is that it remembers where you're up to between invocations.
If you're not iterating you don't need Yield.

The yield construct is used to create an iterator that can produce multiple values in succession:
IEnumerable<int> three_numbers() {
yield return 1;
yield return 2;
yield return 3;
}
...
foreach (var i in three_numbers()) {
// i becomes 1, 2 and 3, in turn.
}

Yield Return will continue the method from that point. For example, you want to loop over an array or list and return each element at the time for the caller to process. So you will use yield return. If you want to return everything and then done, you don't need to do that

It is explained here:
C# Language Reference
yield (C# Reference)
The method called will return every single value so that they can be enumerated by the caller.
This means that you will need to use yield when you want every possible result returned by an iteration.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Should I use Yield when writing my own extension? - c#

Related

Why is DefaultIfEmpty implemented this way?

How to get excluded collection without a second LINQ query?

ICollection<T> is non-index based, but TakeWhile() exists

How to create an extension method to handle bindinglist.removeall with predicate input

When to use Yield?

Categories

Resources