F# yield! operator - Implementation and possible C# equivalents - c#

I'm currently learning F# and I really love the yield! (yield-bang) operator. Not only for its name but also for what it does of course.
The yield! operator basically allows you to yield all elements of a sequence from a sequence expression. This is useful for composing enumerators. Since I regularly encounter big, complicated enumerators I am interested in strategies we can use to break them up and compose them from simpler enumerators.
Unfortunatetely, the yield! operator is not available in C#. As far as I understand, what it does is like a foreach (var x in source) yield x; but the book I'm reading (Petricek's Real World F# - Manning) suggests that it has better performance...
So what exactly does the F# compiler do here? (Yes, I can look at it using Reflector too but I'd like to have a more detailed description of the mechanism).
In order to achieve a similar construct in C# I have explored multiple ways, but none of them is as concise as the yield! operator and I'm also not sure about the complexity of them. Could someone please provide input if my BigO numbers are correct?
Decompose enumerator into multiple private enumerators and then yield each element from the public enumerator:
foreach (var x in part1()) yield x
foreach (var x in part2()) yield x
This will effectively result in a "double yield" on each element. Is that O(2n) then? (or possibly worse?) Anyway, using this approach stops me from using yield break; from any of my subparts.
Decompose enumerator into multiple private enumerators and then concat all private enumerators from the public enumerator:
return part1().Concat(part2())
I believe this is no different from the aforementioned solution because Concat() is implemented the way I outlined above.
Any other options?

In the current version of C#, I don't think you have other options than foreach... yield return and Concat. I agree it would be nice to have the yield! operator in C#, it would make certain constructs much more elegant, but I doubt this feature will ever make it to the "must-have" list, since we can easily do without it.
You might be interested in this MS research paper, which introduces a new yield foreach construct :
IEnumerable<XmlNode> Traverse(XmlNode n)
{
yield return n;
foreach (XmlNode c in n.ChildNodes)
yield foreach Traverse(c);
}
Regarding your question about complexity: in both cases it's O(n). O(2n) is not used, because it denotes the same complexity as O(n) (linear). I don't think you can do better than that with the current C# features...

Regarding how the compiler translates the yield! operation, the paper cited by Thomas Levesque in his answer illustrates one implementation technique in section 4.3 (in particular, their example spanning figures 7-9 is illustrative of the general strategy). I don't think that there's any good way to do this from within an iterator block in C# - as I understand your proposed solutions, they could both result in quadratic behavior when used recursively. You could always manually create a NestedEnumerable<T> subclass to achieve the performance benefits, but this will be quite ugly compared to using a normal iterator block.

There is no direct counterpart to yield! in C#. You're currently stuck with a combination of foreach and yield return.
However, IIRC, LINQ offers something similar, namely the SelectMany query operator, which translates to C# as multiple from .. in .. clauses.
(I'm hoping I'm not mixing up two different concepts, but IIRC, both yield! and SelectMany are essentially "flattening" projections; ie. a hierarchy of objects is "flattened" into a list.)

Related

Is LINQ smart enough not to check conditional flag multiple times?

My question is, will LINQ in the following code read flag value three times when numbers materializing numbers collection? I am trying to optimize my code. Here I want Where clause to be evaluated only once, if flag == true
List<int> list = new(){1, 2, 3};
bool flag = true;
bool IsNumberBig(int num)
{
return num > 100;
}
var numbers = list.Where(l => flag || IsNumberBig(l)).ToList();
I failed to find a related question. Would be thankful to see how I could check this myself.
The value of flag will be evaluated every time the lambda is called. Obviously, that's cheaper than evaluating IsNumberBig() (or some more complex method in there), but still not free.
To optimize this, you could write something like
List<int> numbers;
if (flag)
{
numbers = list;
}
else
{
numbers = list.Where(IsNumberBig).ToList();
}
Like this, no iteration is done if flag is true (which in your case would return all elements, anyway)
I think it is important to note that LINQ is mostly syntactic sugar. It does not do optimization. The vast majority of optimizations are done by the compiler, or more specifically, the jitter.
One problem when discussing optimizations are that the jitter are allowed to perform any kind of optimization as long as the result is the same. But it also have to do any optimizations as fast as possible, so it rarely does all the things it would be allowed to do. It will also depend on the version of the compiler, the more recent ones have tiered compilation to get a better optimization of frequently used loops.
Because of all this it can be difficult to guess what the compiler will and will not optimize, and the best approach is to just benchmark the code. Use Benchmark.Net with and without the check, and that way you will get a correct answer. It should also tell you if the performance difference is anything to worry about.
Even thou guessing what the compiler will do is difficult, there are a few things worthy of note. Most optimizations are done within a method, the compiler will not try to rewrite method signatures. However, small methods tend to be inlined, and can therefore be optimized as part of the calling method. So if all your code was inlined it would very likely remove the flag check. However, one of the things that prevent inlining are indirect calls, like calling a method thru an interface, or in this case, calling a delegate. Just about everything in LINQ is delegates and interfaces, and this tend to hamper performance. So in general, use LINQ for convenience, not due to performance.
All that said, modern processors have pretty amazing branch-predictors, so I would expect the effect of an easy to predict branch like that to be fairly small. There is likely other things that have a larger effect on performance.
But the most important thing is Benchmark and/or profile the code instead of just guessing about performance. It is common for people trying to optimize the completely wrong thing, even for experienced developers. If you want to get started check out Measure app performance in visual studio and Benchmark .net.
You could do this extension:
It basically wraps #PMF's answer in an extension method, so you can use it like the Where you already know. It just takes an additional condition parameter, that switches on/off the application of the predicate. This comes with the advantage that you can chain it with other Linq-Methods just like the plain old Where.
using System.Linq;
using System.Collections.Generic;
public static class LinqEx
{
public static IEnumerable<T> WhereIf<T>(this IEnumerable<T> source, bool condition, Func<T, bool> predicate)
{
return condition
? source.Where(predicate)
: source;
}
}
And then use it like:
var someList = // assume it is a List<int> with some items
var shallFilter = true; // or false
var filteredList = someList.WhereIf(shallFilter, l => IsBigNumber(l)).ToList();
See it in Action in this Fiddle.
Mind that this would make more sense in a DB-Related (Linq to SQL) setting, since that here is really a micro optimization. For it to show effect, you'd have to have many items and a very costly predicate.
One word on your code, too:
var numbers = list.Where(l => flag || IsNumberBig(l)).ToList();
If flag is true, flag || X will evaluate to true, regardless of X. X will not even be evaluated.
So, you basically implemented the opposite of your requirement.
See also: Conditional logical OR operator ||

Performance implications of calling ToArray inside a LINQ selector

If I have the following statement:
whatever.Select(x => collection.ToArray()[index]).ToList();
Is LINQ smart enough to perform the ToArray cast only once (I'm not really aware of how this closure is transformed and evaluated)?
I understand that this code is bad, just interested.
No, it will be performed once for every item in whatever.
You can have a peek at the code for LINQBridge, especially the Select method (that ends up calling SelectYield.
The essence of SelectYield is a simple for-loop:
foreach (var item in source)
yield return selector(item, i++);
Where selector is the lambda expression you pass in, in your case x => collection.ToArray()[index]. From here it is obvious that the whole lambda expression will be evaluated for every element in whatever.
Note that LINQBridge is a stand alone reimplementation of LINQ2Objects and thus not necessarily identical (but to a very large extent at least behaving exactly like LINQ2Objects, including side effects).

C#: yield return range/collection

I use the yield return keyword quite a bit, but I find it lacking when I want to add a range to the IEnumerable. Here's a quick example of what I would like to do:
IEnumerable<string> SomeRecursiveMethod()
{
// some code
// ...
yield return SomeRecursiveMethod();
}
Naturally this results in an error, which can be resolved by doing a simple loop. Is there a better way to do this? A loop feels a bit clunky.
No, there isn't I'm afraid. F# does support this with yield!, but there's no equivalent in C# - you have to use the loop, basically. Sorry... I feel your pain. I mentioned it in one of my Edulinq blog posts, where it would have made things simpler.
Note that using yield return recursively can be expensive - see Wes Dyer's post on iterators for more information (and mentioning a "yield foreach" which was under consideration four years ago...)
If you already have an IEnumerable to loop over, and the return type is IEnumerable (as is the case for functions that could use yield return), you can simply return that enumeration.
If you have cases where you need to combine results from multiple IEnumerables, you can use the IEnumerable<T>.Concat extension method.
In your recursive example, though, you need to terminate the enumeration/concatenation based on the contents of the enumeration. I don't think my method will support this.
The yield keyword is indeed very nice. But nesting it in a for loop will cause more glue code to be generated and executed.
If you can live with a less functional style of programming, you can pass a List around to which you append:
void GenerateList(List<string> result)
{
result.Add("something")
// more code.
GenerateList(result);
}

Would C# benefit from distinctions between kinds of enumerators, like C++ iterators?

I have been thinking about the IEnumerator.Reset() method. I read in the MSDN documentation that it only there for COM interop. As a C++ programmer it looks to me like a IEnumerator which supports Reset is what I would call a forward iterator, while an IEnumerator which does not support Reset is really an input iterator.
So part one of my question is, is this understanding correct?
The second part of my question is, would it be of any benefit in C# if there was a distinction made between input iterators and forward iterators (or "enumerators" if you prefer)? Would it not help eliminate some confusion among programmers, like the one found in this SO question about cloning iterators?
EDIT: Clarification on forward and input iterators. An input iterator only guarantees that you can enumerate the members of a collection (or from a generator function or an input stream) only once. This is exactly how IEnumerator works in C#. Whether or not you can enumerate a second time, is determined by whether or not Reset is supported. A forward iterator, does not have this restriction. You can enumerate over the members as often as you want.
Some C# programmers don't underestand why an IEnumerator cannot be reliably used in a multipass algorithm. Consider the following case:
void PrintContents(IEnumerator<int> xs)
{
while (iter.MoveNext())
Console.WriteLine(iter.Current);
iter.Reset();
while (iter.MoveNext())
Console.WriteLine(iter.Current);
}
If we call PrintContents in this context, no problem:
List<int> ys = new List<int>() { 1, 2, 3 }
PrintContents(ys.GetEnumerator());
However look at the following:
IEnumerable<int> GenerateInts() {
System.Random rnd = new System.Random();
for (int i=0; i < 10; ++i)
yield return Rnd.Next();
}
PrintContents(GenerateInts());
If the IEnumerator supported Reset, in other words supported multi-pass algorithms, then each time you iterated over the collection it would be different. This would be undesirable, because it would be surprising behavior. This example is a bit faked, but it does occur in the real world (e.g. reading from file streams).
Reset was a big mistake. I call shenanigans on Reset. In my opinion, the correct way to reflect the distinction you are making between "forward iterators" and "input iterators" in the .NET type system is with the distinction between IEnumerable<T> and IEnumerator<T>.
See also this answer, where Microsoft's Eric Lippert (in an unofficial capactiy, no doubt, my point is only that he's someone with more credentials than I have to make the claim that this was a design mistake) makes a similar point in comments. Also see also his awesome blog.
Interesting question. My take is that of course C# would benefit. However, it wouldn't be easy to add.
The distinction exists in C++ because of its much more flexible type system. In C#, you don't have a robust generic way to clone objects, which is necessary to represent forward iterators (to support multi-pass iteration). And of course, for this to be really useful, you'd also need to support bidirectional and random-access iterators/enumerators. And to get them all working smoothly, you really need some form of duck-typing, like C++ templates have.
Ultimately, the scopes of the two concepts are different.
In C++, iterators are supposed to represent everything you need to know about a range of values. Given a pair of iterators, I don't need the original container. I can sort, I can search, I can manipulate and copy elements as much as I like. The original container is out of the picture.
In C#, enumerators are not meant to do quite as much. Ultimately, they're just designed to let you run through the sequence in a linear manner.
As for Reset(), it is widely accepted that it was a mistake to add it in the first place. If it had worked, and been implemented correctly, then yes, you could say your enumerator was analogous to forward iterators, but in general, it's best to ignore it as a mistake. And then all enumerators are similar only to input iterators.
Unfortunately.
Coming from the C# perspective:
You almost never use IEnumerator directly. Usually you do a foreach statement, which expects a IEnumerable.
IEnumerable _myCollection;
...
foreach (var item in _myCollection) { /* Do something */ }
You don't pass around IEnumerator either. If you want to pass an collection which needs iteration, you pass IEnumerable. Since IEnumerable has a single function, which returns an IEnumerator, it can be used to iterate the collection multiple times (multiple passes).
There's no need for a Reset() function on IEnumerator because if you want to start over, you just throw away the old one (garbage collected) and get a new one.
The .NET framework would benefit immensely if there were a means of asking an IEnumerator<T> about what abilities it could support and what promises it could make. Such features would also be helpful in IEnumerable<T>, but being able to ask the questions of an enumerator would allow code that can receive an enumerator from wrappers like ReadOnlyCollection to use the underlying collection in improve ways without having to involve the wrapper.
Given any enumerator for a collection that is capable of being enumerated in its entirety and isn't too big, one could produce from it an IEnumerable<T> that would always yield the same sequence of items (specifically the set of items remaining in the enumerator) by reading its entire content to an array, disposing and discarding the enumerator, and getting an enumerators from the array (using that in place of the original abandoned enumerator), wrapping the array in a ReadOnlyCollection<T>, and returning that. Although such an approach would work with any kind of enumerable collection meeting the above criteria, it would be horribly inefficient with most of them. Having a means of asking an enumerator to yield its remaining contents in an immutable IEnumerable<T> would allow many kinds of enumerators to perform the indicated action much more efficiently.
I don't think so. I would call IEnumerable a forward iterator, and an input iterator. It does not allow you to go backwards, or modify the underlying collection. With the addition of the foreach keyword, iterators are almost a non-thought most of the time.
Opinion:
The difference between input iterators (get each one) vs. output iterators (do something to each one) is too trivial to justify an addition to the framework. Also, in order to do an output iterator, you would need to pass a delegate to the iterator. The input iterator seems more natural to C# programmers.
There's also IList<T> if the programmer wants random access.

When not to use lambda expressions [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A lot of questions are being answered on Stack Overflow, with members specifying how to solve these real world/time problems using lambda expressions.
Are we overusing it, and are we considering the performance impact of using lambda expressions?
I found a few articles that explores the performance impact of lambda vs anonymous delegates vs for/foreach loops with different results
Anonymous Delegates vs Lambda Expressions vs Function Calls Performance
Performance of foreach vs. List.ForEach
.NET/C# Loop Performance Test (FOR, FOREACH, LINQ, & Lambda).
DataTable.Select is faster than LINQ
What should be the evaluation criteria when choosing the appropriate solution? Except for the obvious reason that it's more concise code and readable when using lambda.
Even though I will focus on point one, I begin by giving my 2 cents on the whole issue of performance. Unless differences are big or usage is intensive, usually I don't bother about microseconds that when added don't amount to any visible difference to the user. I emphasize that I only don't care when considering non-intensive called methods. Where I do have special performance considerations is on the way I design the application itself. I care about caching, about the use of threads, about clever ways to call methods (whether to make several calls or to try to make only one call), whether to pool connections or not, etc., etc. In fact I usually don't focus on raw performance, but on scalibility. I don't care if it runs better by a tiny slice of a nanosecond for a single user, but I care a lot to have the ability to load the system with big amounts of simultaneous users without noticing the impact.
Having said that, here goes my opinion about point 1. I love anonymous methods. They give me great flexibility and code elegance. The other great feature about anonymous methods is that they allow me to directly use local variables from the container method (from a C# perspective, not from an IL perspective, of course). They spare me loads of code oftentimes. When do I use anonymous methods? Evey single time the piece of code I need isn't needed elsewhere. If it is used in two different places, I don't like copy-paste as a reuse technique, so I'll use a plain ol' delegate. So, just like shoosh answered, it isn't good to have code duplication. In theory there are no performance differences as anonyms are C# tricks, not IL stuff.
Most of what I think about anonymous methods applies to lambda expressions, as the latter can be used as a compact syntax to represent anonymous methods. Let's assume the following method:
public static void DoSomethingMethod(string[] names, Func<string, bool> myExpression)
{
Console.WriteLine("Lambda used to represent an anonymous method");
foreach (var item in names)
{
if (myExpression(item))
Console.WriteLine("Found {0}", item);
}
}
It receives an array of strings and for each one of them, it will call the method passed to it. If that method returns true, it will say "Found...". You can call this method the following way:
string[] names = {"Alice", "Bob", "Charles"};
DoSomethingMethod(names, delegate(string p) { return p == "Alice"; });
But, you can also call it the following way:
DoSomethingMethod(names, p => p == "Alice");
There is no difference in IL between the both, being that the one using the Lambda expression is much more readable. Once again, there is no performance impact as these are all C# compiler tricks (not JIT compiler tricks). Just as I didn't feel we are overusing anonymous methods, I don't feel we are overusing Lambda expressions to represent anonymous methods. Of course, the same logic applies to repeated code: Don't do lambdas, use regular delegates. There are other restrictions leading you back to anonymous methods or plain delegates, like out or ref argument passing.
The other nice things about Lambda expressions is that the exact same syntax doesn't need to represent an anonymous method. Lambda expressions can also represent... you guessed, expressions. Take the following example:
public static void DoSomethingExpression(string[] names, System.Linq.Expressions.Expression<Func<string, bool>> myExpression)
{
Console.WriteLine("Lambda used to represent an expression");
BinaryExpression bExpr = myExpression.Body as BinaryExpression;
if (bExpr == null)
return;
Console.WriteLine("It is a binary expression");
Console.WriteLine("The node type is {0}", bExpr.NodeType.ToString());
Console.WriteLine("The left side is {0}", bExpr.Left.NodeType.ToString());
Console.WriteLine("The right side is {0}", bExpr.Right.NodeType.ToString());
if (bExpr.Right.NodeType == ExpressionType.Constant)
{
ConstantExpression right = (ConstantExpression)bExpr.Right;
Console.WriteLine("The value of the right side is {0}", right.Value.ToString());
}
}
Notice the slightly different signature. The second parameter receives an expression and not a delegate. The way to call this method would be:
DoSomethingExpression(names, p => p == "Alice");
Which is exactly the same as the call we made when creating an anonymous method with a lambda. The difference here is that we are not creating an anonymous method, but creating an expression tree. It is due to these expression trees that we can then translate lambda expressions to SQL, which is what Linq 2 SQL does, for instance, instead of executing stuff in the engine for each clause like the Where, the Select, etc. The nice thing is that the calling syntax is the same whether you're creating an anonymous method or sending an expression.
My answer will not be popular.
I believe Lambda's are 99% always the better choice for three reasons.
First, there is ABSOLUTELY nothing wrong with assuming your developers are smart. Other answers have an underlying premise that every developer but you is stupid. Not so.
Second, Lamdas (et al) are a modern syntax - and tomorrow they will be more commonplace than they already are today. Your project's code should flow from current and emerging conventions.
Third, writing code "the old fashioned way" might seem easier to you, but it's not easier to the compiler. This is important, legacy approaches have little opportunity to be improved as the compiler is rev'ed. Lambdas (et al) which rely on the compiler to expand them can benefit as the compiler deals with them better over time.
To sum up:
Developers can handle it
Everyone is doing it
There's future potential
Again, I know this will not be a popular answer. And believe me "Simple is Best" is my mantra, too. Maintenance is an important aspect to any source. I get it. But I think we are overshadowing reality with some cliché rules of thumb.
// Jerry
Code duplication.
If you find yourself writing the same anonymous function more than once, it shouldn't be one.
Well, when we are talking bout delegate usage, there shouldn't be any difference between lambda and anonymous methods -- they are the same, just with different syntax. And named methods (used as delegates) are also identical from the runtime's viewpoint. The difference, then, is between using delegates, vs. inline code - i.e.
list.ForEach(s=>s.Foo());
// vs.
foreach(var s in list) { s.Foo(); }
(where I would expect the latter to be quicker)
And equally, if you are talking about anything other than in-memory objects, lambdas are one of your most powerful tools in terms of maintaining type checking (rather than parsing strings all the time).
Certainly, there are cases when a simple foreach with code will be faster than the LINQ version, as there will be fewer invokes to do, and invokes cost a small but measurable time. However, in many cases, the code is simply not the bottleneck, and the simpler code (especially for grouping, etc) is worth a lot more than a few nanoseconds.
Note also that in .NET 4.0 there are additional Expression nodes for things like loops, commas, etc. The language doesn't support them, but the runtime does. I mention this only for completeness: I'm certainly not saying you should use manual Expression construction where foreach would do!
I'd say that the performance differences are usually so small (and in the case of loops, obviously, if you look at the results of the 2nd article (btw, Jon Skeet has a similar article here)) that you should almost never choose a solution for performance reasons alone, unless you are writing a piece of software where performance is absolutely the number one non-functional requirement and you really have to do micro-optimalizations.
When to choose what? I guess it depends on the situation but also the person. Just as an example, some people perfer List.Foreach over a normal foreach loop. I personally prefer the latter, as it is usually more readable, but who am I to argue against this?
Rules of thumb:
Write your code to be natural and readable.
Avoid code duplications (lambda expressions might require a little extra diligence).
Optimize only when there's a problem, and only with data to back up what that problem actually is.
Any time the lambda simply passes its arguments directly to another function. Don't create a lambda for function application.
Example:
var coll = new ObservableCollection<int>();
myInts.ForEach(x => coll.Add(x))
Is nicer as:
var coll = new ObservableCollection<int>();
myInts.ForEach(coll.Add)
The main exception is where C#'s type inference fails for whatever reason (and there are plenty of times that's true).
If you need recursion, don't use lambdas, or you'll end up getting very distracted!
Lambda expressions are cool. Over older delegate syntax they have a few advantages like, they can be converted to either anonymous function or expression trees, parameter types are inferred from the declaration, they are cleaner and more concise, etc. I see no real value to not use lambda expression when you're in need of an anonymous function. One not so big advantage the earlier style has is that you can omit the parameter declaration totally if they are not used. Like
Action<int> a = delegate { }; //takes one argument, but no argument specified
This is useful when you have to declare an empty delegate that does nothing, but it is not a strong reason enough to not use lambdas.
Lambdas let you write quick anonymous methods. Now that makes lambdas meaningless everywhere where anonymous methods go meaningless, ie where named methods make more sense. Over named methods, anonymous methods can be disadvantageous (not a lambda expression per se thing, but since these days lambdas widely represent anonymous methods it is relevant):
because it tend to lead to logic duplication (often does, reuse is difficult)
when it is unnecessary to write to one, like:
//this is unnecessary
Func<string, int> f = x => int.Parse(x);
//this is enough
Func<string, int> f = int.Parse;
since writing anonymous iterator block is impossible.
Func<IEnumerable<int>> f = () => { yield return 0; }; //impossible
since recursive lambdas require one more line of quirkiness, like
Func<int, int> f = null;
f = x => (x <= 1) ? 1 : x * f(x - 1);
well, since reflection is kinda messier, but that is moot isn't it?
Apart from point 3, the rest are not strong reasons not to use lambdas.
Also see this thread about what is disadvantageous about Func/Action delegates, since often they are used along with lambda expressions.

Categories