How to make [example] extension method more generic/functional/efficient?

How to make [example] extension method more generic/functional/efficient? - c#

I needed a double[] split into groups of x elements by stride y returning a List. Pretty basic...a loop and/or some linq and your all set. However, I have not been spending much time on extension methods and this looked like a good candidate for some practice. The naive version returns what I am looking for in my current application....
(A)
public static IList<T[]> Split<T>(this IEnumerable<T> source, int every, int take)
{
/*... throw E if X is insane ...*/
var result = source
.Where ((t, i) => i % every == 0)
.Select((t, i) => source.Skip(i * every).Take(take).ToArray())
.ToList();
return result;
}
...the return type is sort of generic...depending on your definition of generic.
I would think...
(B)
public static IEnumerable<IEnumerable<T>> Split<T>
(this IEnumerable<T> source,int every, int take){/*...*/}
...is a better solution...maybe.
Question(s):
Is (B) preferred ?...Why ?
How would you cast (B) as IList <T[]> ?
Any benefit in refactoring ? possibly
two methods that might be chained or the like.
Is the approach sound ?...or have I
missed something basic.
Comments, opinions and harsh language are always appreciated.
Usage Context: C# .Net 4.0

B is probably the better option. Really the major change is that the consumer of the code has the option to make it a list using ToList() on the end of your method, instead of being forced to deal with a List (an IList, actually, which cannot be iterated).
This has a LOT of advantages in method chaining and general use. It's easy to ToList() an enumerable, but hard to go the other way. So, you can call Select().Split().OrderBy() on a list and use the results in a foreach statement without having to have Linq iterate through the whole thing at once.
Refactoring to yield return single values MIGHT get you a performance bonus, but since you're basically just returning the iterator that the Select gave you (which will yield one item at a time itself) I don't think you'll get much benefit in yielding through it yourself.

I would prefer (B) as it looks more flexible. One way of casting the output of the (B) method to an IList<T[]> is as simple as chaining .Select(x => x.ToArray()).ToList() to it, e.g.,
var foo =
bar.Split(someEvery, someTake).Select(x => x.ToArray()).ToList();

In .Net 4, you can just change the return type to IEnumerable<IEnumerable<T>> and it will work.
Before .Net 4, you would have to cast the internal lists to IEnumerable first, by just calling .Cast<IEnumerable<T>>() on your result before returning.

Related

Extending LINQ to accept nullable enumerables

While working with Linq extensions it's normal to see code like this:
IEnumerable<int> enumerable = GetEnumerable();
int sum = 0;
if (enumerable != null)
{
sum = enumerable.Sum();
}
In order to enhance the code quality, I wrote the following extension method that checks for nullable enumerables and breaks the linq execution.
public static IEnumerable<T> IgnoreIfEmpty<T>(this IEnumerable<T> enumerable)
{
if (enumerable == null) yield break;
foreach (var item in enumerable)
{
yield return item;
}
}
So, I can refactor the code to be like this:
var sum = GetEnumerable().IgnoreIfEmpty().Sum();
My questions now:
What penalties are associated with my extension method at runtime?
Is it's a good practice to extend linq that way?
Update:
My target framework is: 3.5

What penalties are associated with my extension method at runtime?
Your extension method is transformed into a state-machine, so there's the minimal overhead of that, but that shouldn't be noticeable.
Is it's a good practice to extend linq that way?
In your question you state:
While working with Linq extensions it's normal to see code like this (insert enumerable null check here)
And I beg to differ. The common practice says don't return null where an IEnumerable<T> is expected. Most cases should return an empty collection (or IEnumerable), leaving null to the exceptional, because null is not empty. This would make your method entirely redundant. Use Enumerable.Empty<T> where needed.

You're going to have a method call overhead, it will be negligible unless you are running it in a tight loop or a performance criticial scenario. It's but a shadow in comparison to something like a database call or writing to a file system. Note that the method is probably not going to be inlined, since it's an enumerator.
It's all about readability / maintainability. What do I expect to happen when I see GetEnumerable().IgnoreIfEmpty().Sum();? In this case, it makes sense.
Note that with C# 6 we can use the following syntax: GetEnumerable()?.Sum() which returns an int?. You could write GetEnumerable()?.Sum() ?? 0 or GetEnumerable()?.Sum().GetValueOrDefault() to get a non-null integer that will default to zero.
If you are truly concerned with performance, you could also slightly refactor your method so that it's not an enumerator. This may increase the chance of inlining, although I have no idea of the 'arcane' logic of the JIT compiler:
public static IEnumerable<T> IgnoreIfEmpty<T>(this IEnumerable<T> enumerable)
{
if (enumerable == null) return Enumerable.Empty<T>();
return enumerable;
}
More generally about extending Linq, I think it is perfectly fine as long as the code makes sense. MSDN even has an article about it. If you look at the standard Where, Select methods in Linq, and forget about the performance optimizations they have in there, the methods are all mostly one-liner methods.

You can skip the additional extension method and use null coalescing operator - this is what it's for, and a one-time check for nullability should be a lot more efficient than another state machine:
IEnumerable<int> enumerable = GetEnumerable();
int sum = 0;
sum = (enumerable ?? Enumerable.Empty<int>()).Sum();

Most of the times we write a lot of code just because we are enchanted by the beauty of our creation - not because we really need it - and then we call it abstraction, reusability, extensibility, etc..
Is this raw piece less readable or less extensible or less reuseable or slower :
var sum = GetEnumerable().Where(a => a != null).Sum();
The less code you write - the less code you test - keep it simple.
BTW - it is good to write extension methods if you can justify it.

Why Standard Extension Method on IEnumerables

When i use a standard Extension Method on a List such as
Where(...)
the result is always IEnumerable, and when
you decide to do a list operation such as Foreach()
we need to Cast(not pretty) or use a ToList() extension method that
(maybe) uses a new List that consumes more memory (is that right?):
List<string> myList=new List<string>(){//some data};
(Edit: this cast on't Work)
myList.Where(p=>p.Length>5).Tolist().Foreach(...);
or
(myList.Where(p=>p.Length>5) as List<string>).Foreach(...);
Which is better code or is there a third way?
Edit:
Foreach is a sample, Replace that with BinarySerach
myList.Where(p=>p.Length>5).Tolist().Binarysearch(...)

The as is definitely not a good approach, and I'd be surprised if it works.
In terms of what is "best", I would propose foreach instead of ForEach:
foreach(var item in myList.Where(p=>p.Length>5)) {
... // do something with item
}
If you desperately want to use list methods, perhaps:
myList.FindAll(p=>p.Length>5).ForEach(...);
or indeed
var result = myList.FindAll(p=>p.Length>5).BinarySearch(...);
but note that this does (unlike the first) require an additional copy of the data, which could be a pain if there are 100,000 items in myList with length above 5.
The reason that LINQ returns IEnumerable<T> is that this (LINQ-to-Objects) is designed to be composable and streaming, which is not possible if you go to a list. For example, a combination of a few where / select etc should not strictly need to create lots of intermediate lists (and indeed, LINQ doesn't).
This is even more important when you consider that not all sequences are bounded; there are infinite sequences, for example:
static IEnumerable<int> GetForever() {
while(true) yield return 42;
}
var thisWorks = GetForever().Take(10).ToList();
as until the ToList it is composing iterators, not generating an intermediate list. There are a few buffered operations, though, like OrderBy, which need to read all the data first. Most LINQ operations are streaming.

One of the design goals for LINQ is to allow composable queries on any supported data type, which is achieved by having return-types specified using generic interfaces rather than concrete classes (such as IEnumerable<T> as you noted). This allows the nuts and bolts to be implemented as needed, either as a concrete class (e.g. WhereEnumerableIterator<T> or hoisted into a SQL query) or using the convenient yield keyword.
Additionally, another design philosophy of LINQ is one of deferred execution. Basically, until you actually use the query, no real work has been done. This allows potentially expensive (or infinite as Mark notes) operations to be completed only exactly as needed.
If List<T>.Where returned another List<T> it would potentially limit composition and would certainly hinder deferred execution (not to mention generate excess memory).
So, looking back at your example, the best way to use the result of the Where operator depends on what you want to do with it!
// This assumes myList has 20,000 entries
// if .Where returned a new list we'd potentially double our memory!
var largeStrings = myList.Where(ss => ss.Length > 100);
foreach (var item in largeStrings)
{
someContainer.Add(item);
}
// or if we supported an IEnumerable<T>
someContainer.AddRange(myList.Where(ss => ss.Length > 100));

If you want to make a simple foreach over a list, you can do like this:
foreach (var item in myList.Where([Where clause]))
{
// Do something with each item.
}

You can't cast (as) IEnumerable<string> to List<string>. IEnumerable evaluates items when you access those. Invoking ToList<string>() will enumerate all items in the collection and returns a new List, which is a bit of memory inefficiency and as well as unnecessary. If you are willing to use ForEach extension method to any collection its better to write a new ForEach extension method that will work on any collection.
public static void ForEach<T>(this IEnumerable<T> enumerableList, Action<T> action)
{
foreach(T item in enumerableList)
{
action(item);
}
}

In which cases are IEnumerable<T>.Count optimized?

Using reflector I have noticed that System.Linq.Enumerable.Count method has a condition in it to optimize it for the case when the IEnumerable<T> passed is in fact an ICollection<T>. If the cast succeeds the Count method does not need to iterate over every element, but can call the Count method of ICollection.
Based on this I was starting to think that IEnumerable<T> can be used like a readonly view of a collection, without having the performance loss that I originally expected based on the API of IEnumerable<T>
I was interested whether the optimization of the Count still holds when the IEnumerable<T> is a result of a Select statement over an ICollection, but based on reflected code this case is not optimized, and requires an iteration through all elements.
Do you draw the same conclusions from reflector? What could be the reason behind the lack of this optimization? I seems like there is a lot of time wasted in this common operation. Does the spec require that the each element is evaluated even if the Count can be determined without doing that?

It doesn't really matter that the result of Select is lazily evaluated. The Count is always equivalent to the count of the original collection so it could have certainly been retrieved directly by returning a specific object from Select that could be used to short-circuit evaluation of the Count method.
The reason it's not possible to optimize out evaluation of the Count() method on the return value of a Select call from something with determined count (like a List<T>) is that it could change the meaning of the program.
The selector function passed to Select method is allowed to have side effects and its side effects are required to happen deterministically, in a predetermined order.
Assume:
new[]{1,2,3}.Select(i => { Console.WriteLine(i); return 0; }).Count();
The documentation requires this code to print
1
2
3
Even though the count is really known from the start and could be optimized, optimization would change the behavior of the program. That's why you can't avoid enumeration of the collection anyway. That's exactly one of the reasons why compiler optimizations are much easier in pure functional languages.
UPDATE: Apparently, it's not clear that it's perfectly possible to implement Select and Count so that Selects on ICollection<T> will still be lazily evaluated but the Count() will be evaluated in O(1) without enumerating the collection. I'm going to do that without changing the interface of any methods. A similar thing is already done for ICollection<T>:
private interface IDirectlyCountable {
int Count {get;}
}
private class SelectICollectionIterator<TSource,TResult> : IEnumerable<T>, IDirectlyCountable {
ICollection<TSource> sequence;
Func<TSource,TResult> selector;
public SelectICollectionIterator(ICollection<TSource> source, Func<TSource,TResult> selector) {
this.sequence = source;
this.selector = selector;
}
public int Count { get { return sequence.Count; } }
// ... GetEnumerator ...
}
public static IEnumerable<TResult> Select<TSource,TResult>(this IEnumerable<TSource> source, Func<TSource,TResult> selector) {
// ... error handling omitted for brevity ...
if (source is ICollection<TSource>)
return new SelectICollectionIterator<TSource,TResult>((ICollection<TSource>)source, selector);
// ... rest of the method ...
}
public static int Count<T>(this IEnumerable<T> source) {
// ...
ICollection<T> collection = source as ICollection<T>;
if (collection != null) return collection.Count;
IDirectlyCountable countableSequence = source as IDirectlyCountable;
if (countableSequence != null) return countableSequence.Count;
// ... enumerate and count the sequence ...
}
This will still evaluate the Count lazily. If you change the underlying collection, the count will get changed and the sequence is not cached. The only difference will be not doing the side effects in the selector delegate.

Edit 02-Feb-2010:
As I see it, there are at least two ways to interpret this question.
Why does the Select<T,
TResult> extension method, when
called on an instance of a class that
implements ICollection<T>, not
return an object that provides a
Count property; and why does the
Count<T> extension method not
check for this property so as to
provide O(1) performance when the two
methods are chained?
This version of the question makes no false assumptions about how Linq extensions work, and is a valid question since a call to ICollection<T>.Select.Count will, after all, always return the same value as ICollection<T>.Count. This is how Mehrdad interpreted the question, to which he has provided a thorough response.
But I read the question as asking...
If the Count<T> extension method provides O(1)
performance for an object of a class
implementing ICollection<T>, why
does it provide O(n) performance for
the return value of the
Select<T, TResult>
extension method?
In this version of the question, there is a mistaken assumption: that the Linq extension methods work together by assembling little collections one after another (in memory) and exposing them through the IEnumerable<T> interface.
If this were how the Linq extensions worked, the Select method might look something like this:
public static IEnumerable<TResult> Select<T, TResult>(this IEnumerable<T> source, Func<T, TResult> selector) {
List<TResult> results = new List<TResult>();
foreach (T input in source)
results.Add(selector(input));
return results;
}
Moreover, if this were the implementation of Select, I think you'd find most code that utilizes this method would behave just the same. But it would be wasteful, and would in fact cause exceptions in certain cases like the one I described in my original answer.
In reality, I believe the implementation of the Select method is much closer to something like this:
public static IEnumerable<TResult> Select<T, TResult>(this IEnumerable<T> source, Func<T, TResult> selector) {
foreach (T input in source)
yield return selector(input);
yield break;
}
This is to provide lazy evaluation, and explains why a Count property is not accessible in O(1) time to the Count method.
So in other words, whereas Mehrdad answered the question of why Select wasn't designed differently so that Select.Count would behave differently, I have offered my best answer to the question of why Select.Count behaves the way it does.
ORIGINAL ANSWER:
Method side effects is not the answer.
According to Mehrdad's answer:
It doesn't really matter that the
result of Select is lazily evaluated.
I don't buy this. Let me explain why.
For starters, consider the following two very similar methods:
public static IEnumerable<double> GetRandomsAsEnumerable(int N) {
Random r = new Random();
for (int i = 0; i < N; ++i)
yield return r.NextDouble();
yield break;
}
public static double[] GetRandomsAsArray(int N) {
Random r = new Random();
double[] values = new double[N];
for (int i = 0; i < N; ++i)
values[i] = r.NextDouble();
return values;
}
OK, what do these methods do? Each one returns as many random doubles as the user desires (up to int.MaxValue). Does it matter whether either method is lazily evaluated or not? To answer this question, let's take a look at the following code:
public static double Invert(double value) {
return 1.0 / value;
}
public static void Test() {
int a = GetRandomsAsEnumerable(int.MaxValue).Select(Invert).Count();
int b = GetRandomsAsArray(int.MaxValue).Select(Invert).Count();
}
Can you guess what will happen with these two method calls? Let me spare you the trouble of copying this code and testing it out yourself:
The first variable, a, will (after a potentially significant amount of time) be initialized to int.MaxValue (currently 2147483647). The second one, b, will very likely be interrupted by an OutOfMemoryException.
Because Select and the other Linq extension methods are lazily evaluated, they allow you to do things you simply could not do otherwise. The above is a fairly trivial example. But my main point is to dispute the assertion that lazy evaluation is not important. Mehrdad's statement that a Count property "is really known from the start and could be optimized" actually begs the question. The issue might seem straightforward for the Select method, but Select is not really special; it returns an IEnumerable<T> just like the rest of the Linq extension methods, and for these methods to "know" the Count of their return values would require full collections to be cached and therefore prohibit lazy evaluation.
Lazy evaluation is the answer.
For this reason, I have to agree with one of the original responders (whose answer now seems to have disappeared) that lazy evaluation really is the answer here. The idea that method side effects need to be accounted for is really secondary, as this is already ensured as a byproduct of lazy evaluation anyway.
Postscript: I've made very assertive statements and emphasized my points mainly because I wanted to be clear on what my argument is, not out of any disrespect for any other responses, including Mehrdad's, which I feel is insightful but misses the mark.

An ICollection knows the number of items (Count) it contains. It doesn't have to iterate any items to determine it. Take for example the HashSet class (which implements ICollection).
An IEnumerable<T> doesn't know how many items it contains. You have to enumerate the whole list to determine the number of items (Count).
Wrapping the ICollection in a LINQ statement, doesn't make it more efficient. No matter how you twist and turn, the ICollection will have to be enumerated.

Is this C# extension method impure and if so, bad code?

I'm learning a bit about function programming, and I'm wondering:
1) If my ForEach extension method is pure? The way I'm calling it seems violate the "don't mess with the object getting passed in", right?
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach ( var item in source )
action(item);
}
static void Main(string[] args)
{
List<Cat> cats = new List<Cat>()
{
new Cat{ Purring=true,Name="Marcus",Age=10},
new Cat{ Purring=false, Name="Fuzzbucket",Age=25 },
new Cat{ Purring=false, Name="Beanhead",Age=9 },
new Cat{Purring=true,Name="Doofus",Age=3}
};
cats.Where(x=>x.Purring==true).ForEach(x =>
{
Console.WriteLine("{0} is a purring cat... purr!", x.Name);
});
// *************************************************
// Does this code make the extension method impure?
// *************************************************
cats.Where(x => x.Purring == false).ForEach(x =>
{
x.Purring = true; // purr,baby
});
// all the cats now purr
cats.Where(x=>x.Purring==true).ForEach(x =>
{
Console.WriteLine("{0} is a purring cat... purr!", x.Name);
});
}
public class Cat {
public bool Purring;
public string Name;
public int Age;
}
2) If it is impure, is it bad code? I personally think it makes cleaner looking code than the old foreach ( var item in items) { blah; }, but I worry that since it might be impure, it could make a mess.
3) Would it be bad code if it returned IEnumerable<T> instead of void? I'd say as long as it is impure, yes it would be very bad code as it would encourage chaining something that would modify the chain. For example, is this bad code?
// possibly bad extension
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach ( var item in source )
action(item);
return source;
}

Impurity doesn't necesarily mean bad code. Many people find it easy and useful to use side effects to solve a problem. The key is first knowing how to do it in a pure way, so you'll know when impurity is appropriate :).
.NET doesn't have the concept of purity in the type system, so a "pure" method that takes in arbitrary delegates can always be impure, depending on how it's called. For instance, "Where", aka "filter", would usually be considered a pure function, since it doesn't modify its arguments or modify global state.
But, there's nothing stopping you from putting such code inside the argument to Where. For example:
things.Where(x => { Console.WriteLine("um?");
return true; })
.Count();
So that's definately an impure usage of Where. Enumerables can do whatever they want as they iterate.
Is your code bad? No. Using a foreach loop is just as "impure" -- you're still modifying the source objects. I write code like that all the time. Chain together some selects, filters, etc., then execute a ForEach on it to invoke some work. You're right, it's cleaner and easier.
Example: ObservableCollection. It has no AddRange method for some reason. So, if I want to add a bunch of things to it, what do I do?
foreach(var x in things.Where(y => y.Foo > 0)) { collection.Add(x)); }
or
things.Where(x => x.Foo > 0).ForEach(collection.Add);
I prefer the second one. At a minimum, I don't see how it can be construed as being worse than the first way.
When is it bad code? When it does side effecting code in a place that's not expected. This is the case for my first example using Where. And even then, there are times when the scope is very limited and the usage is clear.
Chaining ForEach
I've written code that does things like that. To avoid confusion, I would give it another name. The main confusion is "is this immediately evaluated or lazy?". ForEach implies that it'll go execute a loop right away. But something returning an IEnumerable implies that the items will be processed as needed. So I'd suggest giving it another name ("Process", "ModifySeq", "OnEach"... something like that), and making it lazy:
public static IEnumerable<T> OnEach(this IEnumerable<T> src, Action<T> f) {
foreach(var x in src) {
f(x);
yield return x;
}
}

It is not pure, as it can call impure methods. I think by typical definitions, purity is a transitive closure - a function is pure only if all the functions it calls (directly or indirectly) are also pure, or if the effects of those functions are encapsulated (e.g. they only mutate a non-escaping local variable).

Yes, it's not pure, but that's kind of a moot point as it's not even a function.
As the method doesn't return anything, the only option for it to do anything at all is to either affect the objects that you are sending in, or affecting something unrelated (like writing to the console window).
Edit:
To answer your third question; yes, that is bad code, as it seems to be doing something that it doesn't. The method returns a collection so it seems to be pure, but as it just returns the collection that was sent in, it's actually not any more pure than the first version. To make any sense the method should take a Func<T,T> delegate to use as conversion, and return a collection of the converted items:
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Func<T,T> converter) {
foreach (T item in source) {
yield return converter(item);
}
}
It's of course still up to the converter function if the extension call is pure. If it doesn't make a copy of the input item but just changes it and returns it, the call is still not pure.

Indeed, because your lambda expression contains an assignment, the function is now by definition impure. Whether the assignment is related to one of the arguments or another object defined outside the current function is irrelevant... A function must have no side-effects whatsoever in order to be called pure. See Wikipedia for a more precise (though quite straightforward) definition, which details the two conditions a function must satisfy to be deemed pure (having no side-effects is one of them). I believe lambda expressions are typically meant to be used as pure functions (at least I would imagine they were originally studied as such from the mathematical perspective), though clearly C# isn't stringent about this, where purely functional languages are. So it's probably not bad pratice, though it's definitely worth being away that such a function is impure.

Why is there no ForEach extension method on IEnumerable?

Inspired by another question asking about the missing Zip function:
Why is there no ForEach extension method on the IEnumerable interface? Or anywhere? The only class that gets a ForEach method is List<>. Is there a reason why it's missing, maybe performance?

There is already a foreach statement included in the language that does the job most of the time.
I'd hate to see the following:
list.ForEach( item =>
{
item.DoSomething();
} );
Instead of:
foreach(Item item in list)
{
item.DoSomething();
}
The latter is clearer and easier to read in most situations, although maybe a bit longer to type.
However, I must admit I changed my stance on that issue; a ForEach() extension method would indeed be useful in some situations.
Here are the major differences between the statement and the method:
Type checking: foreach is done at runtime, ForEach() is at compile time (Big Plus!)
The syntax to call a delegate is indeed much simpler: objects.ForEach(DoSomething);
ForEach() could be chained: although evilness/usefulness of such a feature is open to discussion.
Those are all great points made by many people here and I can see why people are missing the function. I wouldn't mind Microsoft adding a standard ForEach method in the next framework iteration.

ForEach method was added before LINQ. If you add ForEach extension, it will never be called for List instances because of extension methods constraints. I think the reason it was not added is to not interference with existing one.
However, if you really miss this little nice function, you can roll out your own version
public static void ForEach<T>(
this IEnumerable<T> source,
Action<T> action)
{
foreach (T element in source)
action(element);
}

You could write this extension method:
// Possibly call this "Do"
IEnumerable<T> Apply<T> (this IEnumerable<T> source, Action<T> action)
{
foreach (var e in source)
{
action(e);
yield return e;
}
}
Pros
Allows chaining:
MySequence
.Apply(...)
.Apply(...)
.Apply(...);
Cons
It won't actually do anything until you do something to force iteration. For that reason, it shouldn't be called .ForEach(). You could write .ToList() at the end, or you could write this extension method, too:
// possibly call this "Realize"
IEnumerable<T> Done<T> (this IEnumerable<T> source)
{
foreach (var e in source)
{
// do nothing
;
}
return source;
}
This may be too significant a departure from the shipping C# libraries; readers who are not familiar with your extension methods won't know what to make of your code.

The discussion here gives the answer:
Actually, the specific discussion I witnessed did in fact hinge over functional purity. In an expression, there are frequently assumptions made about not having side-effects. Having ForEach is specifically inviting side-effects rather than just putting up with them. -- Keith Farmer (Partner)
Basically the decision was made to keep the extension methods functionally "pure". A ForEach would encourage side-effects when using the Enumerable extension methods, which was not the intent.

While I agree that it's better to use the built-in foreach construct in most cases, I find the use of this variation on the ForEach<> extension to be a little nicer than having to manage the index in a regular foreach myself:
public static int ForEach<T>(this IEnumerable<T> list, Action<int, T> action)
{
if (action == null) throw new ArgumentNullException("action");
var index = 0;
foreach (var elem in list)
action(index++, elem);
return index;
}
Example
var people = new[] { "Moe", "Curly", "Larry" };
people.ForEach((i, p) => Console.WriteLine("Person #{0} is {1}", i, p));
Would give you:
Person #0 is Moe
Person #1 is Curly
Person #2 is Larry

One workaround is to write .ToList().ForEach(x => ...).
pros
Easy to understand - reader only needs to know what ships with C#, not any additional extension methods.
Syntactic noise is very mild (only adds a little extranious code).
Doesn't usually cost extra memory, since a native .ForEach() would have to realize the whole collection, anyway.
cons
Order of operations isn't ideal. I'd rather realize one element, then act on it, then repeat. This code realizes all elements first, then acts on them each in sequence.
If realizing the list throws an exception, you never get to act on a single element.
If the enumeration is infinite (like the natural numbers), you're out of luck.

I've always wondered that myself, that is why that I always carry this with me:
public static void ForEach<T>(this IEnumerable<T> col, Action<T> action)
{
if (action == null)
{
throw new ArgumentNullException("action");
}
foreach (var item in col)
{
action(item);
}
}
Nice little extension method.

So there has been a lot of comments about the fact that a ForEach extension method isn't appropriate because it doesn't return a value like the LINQ extension methods. While this is a factual statement, it isn't entirely true.
The LINQ extension methods do all return a value so they can be chained together:
collection.Where(i => i.Name = "hello").Select(i => i.FullName);
However, just because LINQ is implemented using extension methods does not mean that extension methods must be used in the same way and return a value. Writing an extension method to expose common functionality that does not return a value is a perfectly valid use.
The specific arguement about ForEach is that, based on the constraints on extension methods (namely that an extension method will never override an inherited method with the same signature), there may be a situation where the custom extension method is available on all classes that impelement IEnumerable<T> except List<T>. This can cause confusion when the methods start to behave differently depending on whether or not the extension method or the inherit method is being called.

You could use the (chainable, but lazily evaluated) Select, first doing your operation, and then returning identity (or something else if you prefer)
IEnumerable<string> people = new List<string>(){"alica", "bob", "john", "pete"};
people.Select(p => { Console.WriteLine(p); return p; });
You will need to make sure it is still evaluated, either with Count() (the cheapest operation to enumerate afaik) or another operation you needed anyway.
I would love to see it brought in to the standard library though:
static IEnumerable<T> WithLazySideEffect(this IEnumerable<T> src, Action<T> action) {
return src.Select(i => { action(i); return i; } );
}
The above code then becomes people.WithLazySideEffect(p => Console.WriteLine(p)) which is effectively equivalent to foreach, but lazy and chainable.

Note that the MoreLINQ NuGet provides the ForEach extension method you're looking for (as well as a Pipe method which executes the delegate and yields its result). See:
https://www.nuget.org/packages/morelinq
https://code.google.com/p/morelinq/wiki/OperatorsOverview

#Coincoin
The real power of the foreach extension method involves reusability of the Action<> without adding unnecessary methods to your code. Say that you have 10 lists and you want to perform the same logic on them, and a corresponding function doesn't fit into your class and is not reused. Instead of having ten for loops, or a generic function that is obviously a helper that doesn't belong, you can keep all of your logic in one place (the Action<>. So, dozens of lines get replaced with
Action<blah,blah> f = { foo };
List1.ForEach(p => f(p))
List2.ForEach(p => f(p))
etc...
The logic is in one place and you haven't polluted your class.

Most of the LINQ extension methods return results. ForEach does not fit into this pattern as it returns nothing.

If you have F# (which will be in the next version of .NET), you can use
Seq.iter doSomething myIEnumerable

Partially it's because the language designers disagree with it from a philosophical perspective.
Not having (and testing...) a feature is less work than having a feature.
It's not really shorter (there's some passing function cases where it is, but that wouldn't be the primary use).
It's purpose is to have side effects, which isn't what linq is about.
Why have another way to do the same thing as a feature we've already got? (foreach keyword)
https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

You can use select when you want to return something.
If you don't, you can use ToList first, because you probably don't want to modify anything in the collection.

I wrote a blog post about it:
http://blogs.msdn.com/kirillosenkov/archive/2009/01/31/foreach.aspx
You can vote here if you'd like to see this method in .NET 4.0:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=279093

In 3.5, all the extension methods added to IEnumerable are there for LINQ support (notice that they are defined in the System.Linq.Enumerable class). In this post, I explain why foreach doesn't belong in LINQ:
Existing LINQ extension method similar to Parallel.For?

Is it me or is the List<T>.Foreach pretty much been made obsolete by Linq.
Originally there was
foreach(X x in Y)
where Y simply had to be IEnumerable (Pre 2.0), and implement a GetEnumerator().
If you look at the MSIL generated you can see that it is exactly the same as
IEnumerator<int> enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
int i = enumerator.Current;
Console.WriteLine(i);
}
(See http://alski.net/post/0a-for-foreach-forFirst-forLast0a-0a-.aspx for the MSIL)
Then in DotNet2.0 Generics came along and the List. Foreach has always felt to me to be an implementation of the Vistor pattern, (see Design Patterns by Gamma, Helm, Johnson, Vlissides).
Now of course in 3.5 we can instead use a Lambda to the same effect, for an example try
http://dotnet-developments.blogs.techtarget.com/2008/09/02/iterators-lambda-and-linq-oh-my/

I would like to expand on Aku's answer.
If you want to call a method for the sole purpose of it's side-effect without iterating the whole enumerable first you can use this:
private static IEnumerable<T> ForEach<T>(IEnumerable<T> xs, Action<T> f) {
foreach (var x in xs) {
f(x); yield return x;
}
}

My version an extension method which would allow you to use ForEach on IEnumerable of T
public static class EnumerableExtension
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
source.All(x =>
{
action.Invoke(x);
return true;
});
}
}

No one has yet pointed out that ForEach<T> results in compile time type checking where the foreach keyword is runtime checked.
Having done some refactoring where both methods were used in the code, I favor .ForEach, as I had to hunt down test failures / runtime failures to find the foreach problems.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to make [example] extension method more generic/functional/efficient? - c#

I would prefer (B) as it looks more flexible. One way of casting the output of the (B) method to an IList<T[]> is as simple as chaining .Select(x => x.ToArray()).ToList() to it, e.g., var foo = bar.Split(someEvery, someTake).Select(x => x.ToArray()).ToList();

In .Net 4, you can just change the return type to IEnumerable<IEnumerable<T>> and it will work. Before .Net 4, you would have to cast the internal lists to IEnumerable first, by just calling .Cast<IEnumerable<T>>() on your result before returning.

Related

Extending LINQ to accept nullable enumerables

Why Standard Extension Method on IEnumerables

In which cases are IEnumerable<T>.Count optimized?

Is this C# extension method impure and if so, bad code?

Why is there no ForEach extension method on IEnumerable?

Categories

Resources