Why is DefaultIfEmpty implemented this way?

Why is DefaultIfEmpty implemented this way? - c#

Chasing the implementation of System.Linq.Enumerable.DefaultIfEmpty took me to this method. It looks alright except for the following quaint details:
// System.Linq.Enumerable
[IteratorStateMachine(typeof(Enumerable.<DefaultIfEmptyIterator>d__90<>))]
private static IEnumerable<TSource> DefaultIfEmptyIterator<TSource>(IEnumerable<TSource> source, TSource defaultValue)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
do
{
yield return enumerator.Current;
}
while (enumerator.MoveNext());
}
else
{
yield return defaultValue;
}
}
IEnumerator<TSource> enumerator = null;
yield break;
yield break;
}
1) Why does the code have to iterate over the whole sequence once it has been established that the sequence is not empty?
2) Why the yield break two times at the end?
3) Why explicitly set the enumerator to null at the end when there is no other reference to it?
I would have left it at this:
// System.Linq.Enumerable
[IteratorStateMachine(typeof(Enumerable.<DefaultIfEmptyIterator>d__90<>))]
private static IEnumerable<TSource> DefaultIfEmptyIterator<TSource>(IEnumerable<TSource> source, TSource defaultValue)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
do
{
yield return enumerator.Current;
}
// while (enumerator.MoveNext());
}
else
{
yield return defaultValue;
}
}
// IEnumerator<TSource> enumerator = null;
yield break;
// yield break;
}

DefaultIfEmpty needs to act as the following:
If the source enumerable has no entries, it needs to act as an enumerable with a single value; the default value.
If the source enumerable is not empty, it needs to act as the source enumerable. Therefore, it needs to yield all values.

Because when you start enumerating and this code is used as another level of enumeration you have to enumerate the whole thing.
If you just yield return the first one and stop there the code using this enumerator will think there is only one value. So you have to enumerate everything there is and yield return it forward.
You could of course do return enumerator and that would work, but not after the MoveNext() has been called since that would cause the first value to be skipped. If there was another way to check if values exist then this would be the way to do it.

Why does the code have to iterate over the whole sequence once it has been established that the sequence is not empty?
As you can read in MSDN about DefaultIfEmtpy return value:
An IEnumerable<T> object that contains the default value for the TSource type if source is empty; otherwise, source.
So, if the enumerable is empty the result is a enumerable containing the default value, but if the enumerable isn't empty the same enumerable is returned (not only the first element).
It may seem that this method is about checking only whether an enumerable contains elements or not, but it is not the case.
Why the yield break two times at the end?
No ideas :)

Related

C# yield return performance

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
The two scenarios:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
public IEnumerable<T> GetList2()
{
List<T> outputList = new List<T>( collection.Count() );
foreach( var item in collection )
outputList.Add( item.Property );
return outputList;
}

yield return does not create an array that has to be resized, like what List does; instead, it creates an IEnumerable with a state machine.
For instance, let's take this method:
public static IEnumerable<int> Foo()
{
Console.WriteLine("Returning 1");
yield return 1;
Console.WriteLine("Returning 2");
yield return 2;
Console.WriteLine("Returning 3");
yield return 3;
}
Now let's call it and assign that enumerable to a variable:
var elems = Foo();
None of the code in Foo has executed yet. Nothing will be printed on the console. But if we iterate over it, like this:
foreach(var elem in elems)
{
Console.WriteLine( "Got " + elem );
}
On the first iteration of the foreach loop, the Foo method will be executed until the first yield return. Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1), and execute until the next yield return. Same for all subsequent elements.
At the end of the loop, the console will look like this:
Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3
This means you can write methods like this:
public static IEnumerable<int> GetAnswers()
{
while( true )
{
yield return 42;
}
}
You can call the GetAnswers method, and every time you request an element, it'll give you 42; the sequence never ends. You couldn't do this with a List, because lists have to have a finite size.

How much space is reserved to the underlying collection behind a method using yield return syntax?
There's no underlying collection.
There's an object, but it isn't a collection. Just how much space it will take up depends on what it needs to keep track of.
There's a chance it will reallocate
No.
And thus decrease performance if compared to the standard approach where i create a list with predefined capacity?
It will almost certainly take up less memory than creating a list with a predefined capacity.
Let's try a manual example. Say we had the following code:
public static IEnumerable<int> CountToTen()
{
for(var i = 1; i != 11; ++i)
yield return i;
}
To foreach through this will iterate through the numbers 1 to 10 inclusive.
Now let's do this the way we would have to if yield did not exist. We'd do something like:
private class CountToTenEnumerator : IEnumerator<int>
{
private int _current;
public int Current
{
get
{
if(_current == 0)
throw new InvalidOperationException();
return _current;
}
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if(_current == 10)
return false;
_current++;
return true;
}
public void Reset()
{
throw new NotSupportedException();
// We *could* just set _current back, but the object produced by
// yield won't do that, so we'll match that.
}
public void Dispose()
{
}
}
private class CountToTenEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new CountToTenEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<int> CountToTen()
{
return new CountToTenEnumerable();
}
Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield, but the basic principle is the same. As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. In practice we can expect yield to store a few more bytes than that, but not a lot.
Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. Since this covers over 99% of use cases yield actually does one allocation rather than two.
Now let's look at:
public IEnumerable<T> GetList1()
{
foreach( var item in collection )
yield return item.Property;
}
While this would result in more memory used than just return collection, it won't result in a lot more; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that.
This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going.
Edit:
You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering.
Now, here we need to add a third possibility: Knowledge of the collection's size.
Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. That can indeed be a considerable saving.
If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo().
This would mean that your GetList2 would result in a faster ToList() than your GetList1.
However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway!
What it should have done here was just return new List<T>(collection); and be done with it.
If though we need to actually do something inside GetList1 or GetList2 (e.g. convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount.

Should I use Yield when writing my own extension?

I wanted to write an extension method (for using it in a fluent syntax) so that If a sequence is :
List< int> lst = new List< int>(){1,2,3 };
I want to repeat it 3 times (for example). so the output would be 123123123
I wrote this :
public static IEnumerable<TSource> MyRepeat<TSource>(this IEnumerable<TSource> source,int n)
{
return Enumerable.Repeat(source,n).SelectMany(f=>f);
}
And now I can do this :
lst.MyRepeat(3)
output :
Question :
Shouldn't I use Yield in the extension method ? I tried yield return but it's not working here. Why is that and should I use it.
edit
After Ant's answer I changed it to :
public static IEnumerable<TSource> MyRepeat<TSource>(this IEnumerable<TSource> source,int n)
{
var k=Enumerable.Repeat(source,n).SelectMany(f=>f);
foreach (var element in k)
{
yield return element;
}
}
But is there any difference ?

This is because the following already returns an IEnumerable:
Enumerable.Repeat(source,n).SelectMany(f=>f);
When you use the yield keyword, you specify that a given iteration over the method will return what follows. So you are essentially saying "each iteration will yield an IEnumerable<TSource>," when actually, each iteration over a method returning an IEnumerable<TSource>should yield a TSource.
Hence, your error - when you iterate over MyRepeat, you are expected to return a TSource but because you are trying to yield an IEnumerable, you are actually trying to return an IEnumerable from every iteration instead of returning a single element.
Your edit should work but is a little pointless - if you simply return the IEnumerable directly it won't be enumerated until you iterate over it (or call ToList or something). In your very first example, SelectMany (or one of its nested methods) will already be using yield, meaning the yield is already there, it's just implicit in your method.

Ant P's answer is of course correct.
You would use yield if you were building the enumerable that is returned yourself, rather than relying on SelectMany. eg:
public static IEnumerable<T> Repeat<T>(this IEnumberable<T> items, int repeat)
{
for (int i = 0; i < repeat; ++i)
foreach(T item in items)
yield return item;
}
The thing you yield is an element of the sequence. The code is instructions for producing the sequence of yielded elements.

Manually increment an enumerator inside foreach loop

I have a nested while loop inside a foreach loop where I would like to advance the enumerator indefinitately while a certain condition is met. To do this I try casting the enumerator to IEnumerator< T > (which it must be if it is in a foreach loop) then calling MoveNext() on the casted object but it gives me an error saying I cannot convert it.
Cannot convert type 'System.DateTime' to System.Collections.Generic.IEnumerator via a reference conversion, boxing conversion, unboxing conversion, wrapping conversion, or null type conversion.
foreach (DateTime time in times)
{
while (condition)
{
// perform action
// move to next item
(time as IEnumerator<DateTime>).MoveNext(); // will not let me do this
}
// code to execute after while condition is met
}
What is the best way to manually increment the IEnumerator inside of the foreach loop?
EDIT:
Edited to show there is code after the while loop that I would like executed once the condition is met which is why I wanted to manually increment inside the while then break out of it as opposed to continue which would put me back at the top. If this isn't possible I believe the best thing is to redesign how I am doing it.

Many of the other answers recommend using continue, which may very well help you do what you need to do. However, in the interests of showing manually moving the enumerator, first you must have the enumerator, and that means writing your loop as a while.
using (var enumerator = times.GetEnumerator())
{
DateTime time;
while (enumerator.MoveNext())
{
time = enumerator.Current;
// pre-condition code
while (condition)
{
if (enumerator.MoveNext())
{
time = enumerator.Current;
// condition code
}
else
{
condition = false;
}
}
// post-condition code
}
}
From your comments:
How can the foreach loop advance it if it doesn't implement the IEnumerator interface?
In your loop, time is a DateTime. It is not the object that needs to implement an interface or pattern to work in the loop. times is a sequence of DateTime values, it is the one that must implement the enumerable pattern. This is generally fulfilled by implementing the IEnumerable<T> and IEnumerable interfaces, which simply require T GetEnumerator() and object GetEnumerator() methods. The methods return an object implementing IEnumerator<T> and IEnumerator, which define a bool MoveNext() method and a T or object Current property. But time cannot be cast to IEnumerator, because it is no such thing, and neither is the times sequence.

You cannot modify the enumerator from inside the for loop. The language does not permit this. You need to use the continue statement in order to advance to the next iteration of a loop.
However, I'm not convinced that your loop even needs a continue. Read on.
In the context of your code you would need to convert the while to an if in order to make the continue refer to the foreach block.
foreach (DateTime time in times)
{
if (condition)
{
// perform action
continue;
}
// code to execute if condition is not met
}
But written like this it is clear that the following equivalent variant is simpler still
foreach (DateTime time in times)
{
if (condition)
{
// perform action
}
else
{
// code to execute if condition is not met
}
}
This is equivalent to your pseudo-code because the part marked code to execute after while condition is met is executed for each item for which condition is false.
My assumption in all of this is that condition is evaluated for each item in the list.

Perhaps you can use continue?

You would use the continue statement:
continue;

This is just a guess, but it sounds like what you're trying to do is take a list of datetimes and move past all of them which meet a certain criteria, then perform an action on the rest of the list. If that's what you're trying to do, you probably want something like SkipWhile() from System.Linq. For example, the following code takes a series of datetimes and skips past all of them which are before the cutoff date; then it prints out the remaining datetimes:
var times = new List<DateTime>()
{
DateTime.Now.AddDays(1), DateTime.Now.AddDays(2), DateTime.Now.AddDays(3), DateTime.Now.AddDays(4)
};
var cutoff = DateTime.Now.AddDays(2);
var timesAfterCutoff = times.SkipWhile(datetime => datetime.CompareTo(cutoff) < 1)
.Select(datetime => datetime);
foreach (var dateTime in timesAfterCutoff)
{
Console.WriteLine(dateTime);
}
Console.ReadLine();
Is that the sort of thing you're trying to do?

I definitely do not condone what I am about to suggest, but you can create a wrapper around the original IEnumerable to transform it into something that returns items which can be used to navigate the underlying the enumerator. The end result might look like the following.
public static void Main(string[] args)
{
IEnumerable<DateTime> times = GetTimes();
foreach (var step in times.StepWise())
{
while (condition)
{
step.MoveNext();
}
Console.WriteLine(step.Current);
}
}
Then we need to create our StepWise extension method.
public static class EnumerableExtension
{
public static IEnumerable<Step<T>> StepWise<T>(this IEnumerable<T> instance)
{
using (IEnumerator<T> enumerator = instance.GetEnumerator())
{
while (enumerator.MoveNext())
{
yield return new Step<T>(enumerator);
}
}
}
public struct Step<T>
{
private IEnumerator<T> enumerator;
public Step(IEnumerator<T> enumerator)
{
this.enumerator = enumerator;
}
public bool MoveNext()
{
return enumerator.MoveNext();
}
public T Current
{
get { return enumerator.Current; }
}
}
}

You could use a func as your iterator and keep the state that you are changing in that delegate to be evaluated each iteration.
public static IEnumerable<T> FunkyIEnumerable<T>(this Func<Tuple<bool, T>> nextOrNot)
{
while(true)
{
var result = nextOrNot();
if(result.Item1)
yield return result.Item2;
else
break;
}
yield break;
}
Func<Tuple<bool, int>> nextNumber = () =>
Tuple.Create(SomeRemoteService.CanIContinueToSendNumbers(), 1);
foreach(var justGonnaBeOne in nextNumber.FunkyIEnumerable())
Console.Writeline(justGonnaBeOne.ToString());

One alternative not yet mentioned is to have an enumerator return a wrapper object which allows access to itself in addition to the data element being enumerated. For sample:
struct ControllableEnumeratorItem<T>
{
private ControllableEnumerator parent;
public T Value {get {return parent.Value;}}
public bool MoveNext() {return parent.MoveNext();}
public ControllableEnumeratorItem(ControllableEnumerator newParent)
{parent = newParent;}
}
This approach could also be used by data structures that want to allow collections to be modified in controlled fashion during enumeration (e.g. by including "DeleteCurrentItem", "AddBeforeCurrentItem", and "AddAfterCurrentItem" methods).

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.

I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}

Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);

var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()

Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.

All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.

If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

C# IEnumerable Retrieve The First Record

I have an IEnumerable list of objects in C#. I can use a for each to loop through and examine each object fine, however in this case all I want to do is examine the first object is there a way to do this without using a foreach loop?
I've tried mylist[0] but that didnt work.
Thanks

(For the sake of convenience, this answer assumes myList implements IEnumerable<string>; replace string with the appropriate type where necessary.)
If you're using .NET 3.5, use the First() extension method:
string first = myList.First();
If you're not sure whether there are any values or not, you can use the FirstOrDefault() method which will return null (or more generally, the default value of the element type) for an empty sequence.
You can still do it "the long way" without a foreach loop:
using (IEnumerator<string> iterator = myList.GetEnumerator())
{
if (!iterator.MoveNext())
{
throw new WhateverException("Empty list!");
}
string first = iterator.Current;
}
It's pretty ugly though :)
In answer to your comment, no, the returned iterator is not positioned at the first element initially; it's positioned before the first element. You need to call MoveNext() to move it to the first element, and that's how you can tell the difference between an empty sequence and one with a single element in.
EDIT: Just thinking about it, I wonder whether this is a useful extension method:
public static bool TryFirst(this IEnumerable<T> source, out T value)
{
using (IEnumerator<T> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
value = default(T);
return false;
}
value = iterator.Current;
return true;
}
}

Remember, there may be no "first element" if the sequence is empty.
IEnumerable<int> z = new List<int>();
int y = z.FirstOrDefault();

If you're not on 3.5:
using (IEnumerator<Type> ie = ((IEnumerable<Type>)myList).GetEnumerator()) {
if (ie.MoveNext())
value = ie.Current;
else
// doesn't exist...
}
or
Type value = null;
foreach(Type t in myList) {
value = t;
break;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is DefaultIfEmpty implemented this way? - c#

DefaultIfEmpty needs to act as the following: If the source enumerable has no entries, it needs to act as an enumerable with a single value; the default value. If the source enumerable is not empty, it needs to act as the source enumerable. Therefore, it needs to yield all values.

Related

C# yield return performance

Should I use Yield when writing my own extension?

Manually increment an enumerator inside foreach loop

How to properly check IEnumerable for existing results

C# IEnumerable Retrieve The First Record

Categories

Resources