Related
My question is, will LINQ in the following code read flag value three times when numbers materializing numbers collection? I am trying to optimize my code. Here I want Where clause to be evaluated only once, if flag == true
List<int> list = new(){1, 2, 3};
bool flag = true;
bool IsNumberBig(int num)
{
return num > 100;
}
var numbers = list.Where(l => flag || IsNumberBig(l)).ToList();
I failed to find a related question. Would be thankful to see how I could check this myself.
The value of flag will be evaluated every time the lambda is called. Obviously, that's cheaper than evaluating IsNumberBig() (or some more complex method in there), but still not free.
To optimize this, you could write something like
List<int> numbers;
if (flag)
{
numbers = list;
}
else
{
numbers = list.Where(IsNumberBig).ToList();
}
Like this, no iteration is done if flag is true (which in your case would return all elements, anyway)
I think it is important to note that LINQ is mostly syntactic sugar. It does not do optimization. The vast majority of optimizations are done by the compiler, or more specifically, the jitter.
One problem when discussing optimizations are that the jitter are allowed to perform any kind of optimization as long as the result is the same. But it also have to do any optimizations as fast as possible, so it rarely does all the things it would be allowed to do. It will also depend on the version of the compiler, the more recent ones have tiered compilation to get a better optimization of frequently used loops.
Because of all this it can be difficult to guess what the compiler will and will not optimize, and the best approach is to just benchmark the code. Use Benchmark.Net with and without the check, and that way you will get a correct answer. It should also tell you if the performance difference is anything to worry about.
Even thou guessing what the compiler will do is difficult, there are a few things worthy of note. Most optimizations are done within a method, the compiler will not try to rewrite method signatures. However, small methods tend to be inlined, and can therefore be optimized as part of the calling method. So if all your code was inlined it would very likely remove the flag check. However, one of the things that prevent inlining are indirect calls, like calling a method thru an interface, or in this case, calling a delegate. Just about everything in LINQ is delegates and interfaces, and this tend to hamper performance. So in general, use LINQ for convenience, not due to performance.
All that said, modern processors have pretty amazing branch-predictors, so I would expect the effect of an easy to predict branch like that to be fairly small. There is likely other things that have a larger effect on performance.
But the most important thing is Benchmark and/or profile the code instead of just guessing about performance. It is common for people trying to optimize the completely wrong thing, even for experienced developers. If you want to get started check out Measure app performance in visual studio and Benchmark .net.
You could do this extension:
It basically wraps #PMF's answer in an extension method, so you can use it like the Where you already know. It just takes an additional condition parameter, that switches on/off the application of the predicate. This comes with the advantage that you can chain it with other Linq-Methods just like the plain old Where.
using System.Linq;
using System.Collections.Generic;
public static class LinqEx
{
public static IEnumerable<T> WhereIf<T>(this IEnumerable<T> source, bool condition, Func<T, bool> predicate)
{
return condition
? source.Where(predicate)
: source;
}
}
And then use it like:
var someList = // assume it is a List<int> with some items
var shallFilter = true; // or false
var filteredList = someList.WhereIf(shallFilter, l => IsBigNumber(l)).ToList();
See it in Action in this Fiddle.
Mind that this would make more sense in a DB-Related (Linq to SQL) setting, since that here is really a micro optimization. For it to show effect, you'd have to have many items and a very costly predicate.
One word on your code, too:
var numbers = list.Where(l => flag || IsNumberBig(l)).ToList();
If flag is true, flag || X will evaluate to true, regardless of X. X will not even be evaluated.
So, you basically implemented the opposite of your requirement.
See also: Conditional logical OR operator ||
After asking my question on Code Review.stackexchange I got advised to use the following code pieces. And I noticed that during the transfer the string[] Lines gets set over to a IEnumerable.
After looking around the IEnumerable function for a bit I did not find anything that suggested any performance improvement. So Is this just for readability? Or is there actually a performance difference or general advantage using the IEnumerable instead of an array?
private void ProcessSingleItem(String fileName, String oldId, String newId)
{
string[] lines = File.ReadAllLines(fileName);
File.WriteAllText(fileName, ProcessLines(lines, oldId, newId));
}
private String ProcessLines(IEnumerable<String> lines, String oldId, String newId)
{
StringBuilder sb = new StringBuilder(2048);
foreach (String line in lines)
{
sb.AppendLine(line.Replace(oldId, newId));
}
return sb.ToString();
}
So far all the answers have said that being more general in what you accept makes your helper method more useful. This is correct. However, there are other considerations as well.
Pro: Taking a sequence rather than an array communicates to the developer who is calling your code "I am not going to mutate the object you pass me". When I call a method that takes an array, how do I know that it isn't going to change the array?
Con: taking a more general type means that your method must be correct for any instance of that more general type. How do you know it is correct? Testing. So taking a more general type can mean a larger test burden. If you take an array then you only have a few cases: null arrays, empty arrays, covariant arrays and so on. If you take a sequence the number of cases to test is larger.
You mentioned performance. Remember, making micro-decisions based on feelings rather than empirical data is a terrible way to get good performance. Instead, set a performance goal, measure your progress against that goal, and use a profiler to find the slowest thing; attack that first. A foreach loop on an array compiles down to the equivalent for loop; on an IEnumerable the code is more complicated and possibly several microseconds slower. Is the success or failure of your application in the marketplace going to be determined by those microseconds? If so, then carefully measure the performance. If not, write the code the way you like it, and if you've introduced a regression that causes you to no longer meet your goal, your automated tests will tell you about it. You are running automated performance tests, right? If you care so much about performance, you ought to be.
An array is an implementation ofIEnumerable.
The only members used from the lines parameter in your method, are those defined in IEnumerable. So you can choose IEnumerable as a parameter type and be the least picky in what you accept, allowing any IEnumerable implementation to be supplied as parameter.
Consider this example:
ProcessLines(GetItems(), ...);
public IEnumerable<string> GetItems()
{
yield return "ItemAlwaysGetsIncluded";
if (!once_in_blue_moon)
{
yield break;
}
yield return "ItemIncludedOnceInABlueMoon";
}
It's hard to tell what the performance impact is, since IEnumerable could be just about anything.
Have a look at the Law of Demeter. You want to have parameters as generic as possible so that they can be used in more situations.
Now you pass in any collection that implements IEnumerable and are not only limited to arrays.
In this case it is more a design issue than a performance issue. Returning values are a different case though as there are definitely some performance gains there.
In this case, you are just iterating so IEnumerable is all that you need
IEnumerable provides only minimal "iterable" functionality. You can traverse the sequence, but that's about it. This has disadvantages -- for example, it is very inefficient to count elements using IEnumerable, or to get the nth element -- but it has advantages too -- for example, an IEnumerable could be an endless sequence, like the sequence of primes.
Array is a fixed-size collection with random access (i.e. you can index into it)
In the code above it doesn't make any difference as you have materialized array, IEnumerable would really help if you use
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
yield return line;
}
file.Close();
here
So in above code we will have only one line in memory all the time, helping read larger size of file with only few bytes of memory consumption.
Also general law is while passing input parameter to function type should be widely acceptable, so as IEnumerable is implemented by all the collections and collection based interfaces its good idea to have type of parameter as IEnumerable so that you can pass List or any other collection type.
If I have the following statement:
whatever.Select(x => collection.ToArray()[index]).ToList();
Is LINQ smart enough to perform the ToArray cast only once (I'm not really aware of how this closure is transformed and evaluated)?
I understand that this code is bad, just interested.
No, it will be performed once for every item in whatever.
You can have a peek at the code for LINQBridge, especially the Select method (that ends up calling SelectYield.
The essence of SelectYield is a simple for-loop:
foreach (var item in source)
yield return selector(item, i++);
Where selector is the lambda expression you pass in, in your case x => collection.ToArray()[index]. From here it is obvious that the whole lambda expression will be evaluated for every element in whatever.
Note that LINQBridge is a stand alone reimplementation of LINQ2Objects and thus not necessarily identical (but to a very large extent at least behaving exactly like LINQ2Objects, including side effects).
I am using some of the LINQ select stuff to create some collections, which return IEnumerable<T>.
In my case I need a List<T>, so I am passing the result to List<T>'s constructor to create one.
I am wondering about the overhead of doing this. The items in my collections are usually in the millions, so I need to consider this.
I assume, if the IEnumerable<T> contains ValueTypes, it's the worst performance.
Am I right? What about Ref Types? Either way there is also the cost of calling, List<T>.Add a million times, right?
Any way to solve this? Like can I "overload" methods like LINQ Select using extension methods)?
No, there's no particular penalty for the element type being value types, assuming you're using IEnumerable<T> instead of IEnumerable. You won't get any boxing going on.
If you actually know the size of the result beforehand (which the result of Select probably won't) you might want to consider creating the list with that size of buffer, then using AddRange to add the values. Otherwise the list will have to resize its buffer every time it fills it.
For instance, instead of doing:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(foo => foo.Name);
List<string> queryList = new List<string>(query);
you might do:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(x => x.Name);
List<string> queryList = new List<string>(foo.Length);
queryList.AddRange(query);
You know that calling Select will produce a sequence of the same length as the original query source, but nothing in the execution environment has that information as far as I'm aware.
It would be best to avoid the need for a list. If you can keep your caller using IEnumerable<T>, you will save yourself some headaches.
LINQ's ToList() will take your enumerable, and just construct a new List<T> directly from it, using the List<T>(IEnumerable<T>) constructor. This will be the same as making the list yourself, performance wise (although LINQ does a null check, as well).
If you're adding the elements yourself, use the AddRange method instead of the Add. ToList() is very similar to AddRange (since it's using the constructor which takes IEnumerable<T>), which typically will be your best bet, performance wise, in this case.
Generally speaking, a method returning IEnumerable doesn't have to evaluate any of the items before the item is actually needed. So, theoretically, when you return an IEnumerable none of you items need to exist at that time.
So creating a list means that you will really need to evaluate items, get them and place them somewhere in memory (at least their references). There is nothing that can be done about this - if you really need to have a list.
A number of other responders have already provided ideas for how to improve the performance of copying an IEnumerable<T> into a List<T> - I don't think that much can be added on that front.
However, based on what you have described you need to do with the results, and the fact that you get rid of the list when you're done (which I presume means that the intermediate results are not interesting) - you may want to consider whether you really need to materialize a List<T>.
Rather than creating a List<T> and operating on the contents of that list - consider writing a lazy extension method for IEnumerable<T> that performs the same processing logic. I've done this myself in a number of cases, and writing such logic in C# is not so bad when using the [yield return][1] syntax supported by the compiler.
This approach works well if all you're trying to do is visit each item in the results and collection some information from it. Often, what you need to do is just visit each element in the collection on demand, do some processing with it, and then move on. This approach is generally more scalable and performant that creating a copy of the collection just to iterate over it.
Now, this advice may not work for you for other reasons, but it's worth considering as an alternative to finding the most efficient way to materialize a very large list.
Don't pass an IEnumerable to the List constructor. IEnumerable has a ToList() method, which can't possibly do worse than that, and has nicer syntax (IMHO).
That said, that only changes the answer to your question to "it depends" - in particular, it depends on what the IEnumerable actually is behind the scenes. If it happens to be a List already, then ToList will effectively be free, of course will go much faster than if it were another type. It's still not super-fast.
The best way to solve this, of course, is to try to figure out how to do your processing on an IEnumerable rather than a List. That may not be possible.
Edit: Some people in the comments are debating whether or not ToList() will actually be any faster when called on a List than if not, and whether ToList() will be any faster than the list constructor. At this point, speculating is getting pointless, so here's some code:
using System;
using System.Linq;
using System.Collections.Generic;
public static class ToListTest
{
public static int Main(string[] args)
{
List<int> intlist = new List<int>();
for (int i = 0; i < 1000000; i++)
intlist.Add(i);
IEnumerable<int> intenum = intlist;
for (int i = 0; i < 1000; i++)
{
List<int> foo = intenum.ToList();
}
return 0;
}
}
Running this code with an IEnumerable that's really a List goes about 6-10 times faster than if I replace it with a LinkedList or Stack (on my pokey 2.4 GHz P4, using Mono 1.2.6). Conceivably this could be due to some unfortunate interaction between ToList() and the particular implementations of LinkedList or Stack's enumerations, but at least the point remains: speed will depend on the underlying type of the IEnumerable. That said, even with a List as the source, it still takes 6 seconds for me to make 1000 ToList() calls, so it's far from free.
The next question is whether ToList() is any more intelligent than the List constructor. The answer to that turns out to be no: the List constructor is just as fast as ToList(). In hindsight, Jon Skeet's reasoning makes sense - I was just forgetting that ToList() was an extension method. I still (much) prefer ToList() syntactically, but there's no performance reason to use it.
So the short version is that the best answer is still "don't convert to a List if you can avoid it". Barring that, actual performance will depend drastically on what the IEnumerable actually is, but at best it'll be sluggish, as opposed to glacial. I've amended my original answer to reflect this.
From reading the various comments and the question I get the following requirements
for a collection of data you need to run through that collection, filter out some objects and then perform some transformation on the remaining objects. If thats the case you can do something like this:
var result = from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item);
and if you need to the perform more filtering and transformation you can nest your LINQ queries like
var result = from filteredItem in (from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item))
where filteredItem.SomePropertyAvailableAfterFirstTransformation == "new"
select SecondTransfomation(filteredItem);
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A lot of questions are being answered on Stack Overflow, with members specifying how to solve these real world/time problems using lambda expressions.
Are we overusing it, and are we considering the performance impact of using lambda expressions?
I found a few articles that explores the performance impact of lambda vs anonymous delegates vs for/foreach loops with different results
Anonymous Delegates vs Lambda Expressions vs Function Calls Performance
Performance of foreach vs. List.ForEach
.NET/C# Loop Performance Test (FOR, FOREACH, LINQ, & Lambda).
DataTable.Select is faster than LINQ
What should be the evaluation criteria when choosing the appropriate solution? Except for the obvious reason that it's more concise code and readable when using lambda.
Even though I will focus on point one, I begin by giving my 2 cents on the whole issue of performance. Unless differences are big or usage is intensive, usually I don't bother about microseconds that when added don't amount to any visible difference to the user. I emphasize that I only don't care when considering non-intensive called methods. Where I do have special performance considerations is on the way I design the application itself. I care about caching, about the use of threads, about clever ways to call methods (whether to make several calls or to try to make only one call), whether to pool connections or not, etc., etc. In fact I usually don't focus on raw performance, but on scalibility. I don't care if it runs better by a tiny slice of a nanosecond for a single user, but I care a lot to have the ability to load the system with big amounts of simultaneous users without noticing the impact.
Having said that, here goes my opinion about point 1. I love anonymous methods. They give me great flexibility and code elegance. The other great feature about anonymous methods is that they allow me to directly use local variables from the container method (from a C# perspective, not from an IL perspective, of course). They spare me loads of code oftentimes. When do I use anonymous methods? Evey single time the piece of code I need isn't needed elsewhere. If it is used in two different places, I don't like copy-paste as a reuse technique, so I'll use a plain ol' delegate. So, just like shoosh answered, it isn't good to have code duplication. In theory there are no performance differences as anonyms are C# tricks, not IL stuff.
Most of what I think about anonymous methods applies to lambda expressions, as the latter can be used as a compact syntax to represent anonymous methods. Let's assume the following method:
public static void DoSomethingMethod(string[] names, Func<string, bool> myExpression)
{
Console.WriteLine("Lambda used to represent an anonymous method");
foreach (var item in names)
{
if (myExpression(item))
Console.WriteLine("Found {0}", item);
}
}
It receives an array of strings and for each one of them, it will call the method passed to it. If that method returns true, it will say "Found...". You can call this method the following way:
string[] names = {"Alice", "Bob", "Charles"};
DoSomethingMethod(names, delegate(string p) { return p == "Alice"; });
But, you can also call it the following way:
DoSomethingMethod(names, p => p == "Alice");
There is no difference in IL between the both, being that the one using the Lambda expression is much more readable. Once again, there is no performance impact as these are all C# compiler tricks (not JIT compiler tricks). Just as I didn't feel we are overusing anonymous methods, I don't feel we are overusing Lambda expressions to represent anonymous methods. Of course, the same logic applies to repeated code: Don't do lambdas, use regular delegates. There are other restrictions leading you back to anonymous methods or plain delegates, like out or ref argument passing.
The other nice things about Lambda expressions is that the exact same syntax doesn't need to represent an anonymous method. Lambda expressions can also represent... you guessed, expressions. Take the following example:
public static void DoSomethingExpression(string[] names, System.Linq.Expressions.Expression<Func<string, bool>> myExpression)
{
Console.WriteLine("Lambda used to represent an expression");
BinaryExpression bExpr = myExpression.Body as BinaryExpression;
if (bExpr == null)
return;
Console.WriteLine("It is a binary expression");
Console.WriteLine("The node type is {0}", bExpr.NodeType.ToString());
Console.WriteLine("The left side is {0}", bExpr.Left.NodeType.ToString());
Console.WriteLine("The right side is {0}", bExpr.Right.NodeType.ToString());
if (bExpr.Right.NodeType == ExpressionType.Constant)
{
ConstantExpression right = (ConstantExpression)bExpr.Right;
Console.WriteLine("The value of the right side is {0}", right.Value.ToString());
}
}
Notice the slightly different signature. The second parameter receives an expression and not a delegate. The way to call this method would be:
DoSomethingExpression(names, p => p == "Alice");
Which is exactly the same as the call we made when creating an anonymous method with a lambda. The difference here is that we are not creating an anonymous method, but creating an expression tree. It is due to these expression trees that we can then translate lambda expressions to SQL, which is what Linq 2 SQL does, for instance, instead of executing stuff in the engine for each clause like the Where, the Select, etc. The nice thing is that the calling syntax is the same whether you're creating an anonymous method or sending an expression.
My answer will not be popular.
I believe Lambda's are 99% always the better choice for three reasons.
First, there is ABSOLUTELY nothing wrong with assuming your developers are smart. Other answers have an underlying premise that every developer but you is stupid. Not so.
Second, Lamdas (et al) are a modern syntax - and tomorrow they will be more commonplace than they already are today. Your project's code should flow from current and emerging conventions.
Third, writing code "the old fashioned way" might seem easier to you, but it's not easier to the compiler. This is important, legacy approaches have little opportunity to be improved as the compiler is rev'ed. Lambdas (et al) which rely on the compiler to expand them can benefit as the compiler deals with them better over time.
To sum up:
Developers can handle it
Everyone is doing it
There's future potential
Again, I know this will not be a popular answer. And believe me "Simple is Best" is my mantra, too. Maintenance is an important aspect to any source. I get it. But I think we are overshadowing reality with some cliché rules of thumb.
// Jerry
Code duplication.
If you find yourself writing the same anonymous function more than once, it shouldn't be one.
Well, when we are talking bout delegate usage, there shouldn't be any difference between lambda and anonymous methods -- they are the same, just with different syntax. And named methods (used as delegates) are also identical from the runtime's viewpoint. The difference, then, is between using delegates, vs. inline code - i.e.
list.ForEach(s=>s.Foo());
// vs.
foreach(var s in list) { s.Foo(); }
(where I would expect the latter to be quicker)
And equally, if you are talking about anything other than in-memory objects, lambdas are one of your most powerful tools in terms of maintaining type checking (rather than parsing strings all the time).
Certainly, there are cases when a simple foreach with code will be faster than the LINQ version, as there will be fewer invokes to do, and invokes cost a small but measurable time. However, in many cases, the code is simply not the bottleneck, and the simpler code (especially for grouping, etc) is worth a lot more than a few nanoseconds.
Note also that in .NET 4.0 there are additional Expression nodes for things like loops, commas, etc. The language doesn't support them, but the runtime does. I mention this only for completeness: I'm certainly not saying you should use manual Expression construction where foreach would do!
I'd say that the performance differences are usually so small (and in the case of loops, obviously, if you look at the results of the 2nd article (btw, Jon Skeet has a similar article here)) that you should almost never choose a solution for performance reasons alone, unless you are writing a piece of software where performance is absolutely the number one non-functional requirement and you really have to do micro-optimalizations.
When to choose what? I guess it depends on the situation but also the person. Just as an example, some people perfer List.Foreach over a normal foreach loop. I personally prefer the latter, as it is usually more readable, but who am I to argue against this?
Rules of thumb:
Write your code to be natural and readable.
Avoid code duplications (lambda expressions might require a little extra diligence).
Optimize only when there's a problem, and only with data to back up what that problem actually is.
Any time the lambda simply passes its arguments directly to another function. Don't create a lambda for function application.
Example:
var coll = new ObservableCollection<int>();
myInts.ForEach(x => coll.Add(x))
Is nicer as:
var coll = new ObservableCollection<int>();
myInts.ForEach(coll.Add)
The main exception is where C#'s type inference fails for whatever reason (and there are plenty of times that's true).
If you need recursion, don't use lambdas, or you'll end up getting very distracted!
Lambda expressions are cool. Over older delegate syntax they have a few advantages like, they can be converted to either anonymous function or expression trees, parameter types are inferred from the declaration, they are cleaner and more concise, etc. I see no real value to not use lambda expression when you're in need of an anonymous function. One not so big advantage the earlier style has is that you can omit the parameter declaration totally if they are not used. Like
Action<int> a = delegate { }; //takes one argument, but no argument specified
This is useful when you have to declare an empty delegate that does nothing, but it is not a strong reason enough to not use lambdas.
Lambdas let you write quick anonymous methods. Now that makes lambdas meaningless everywhere where anonymous methods go meaningless, ie where named methods make more sense. Over named methods, anonymous methods can be disadvantageous (not a lambda expression per se thing, but since these days lambdas widely represent anonymous methods it is relevant):
because it tend to lead to logic duplication (often does, reuse is difficult)
when it is unnecessary to write to one, like:
//this is unnecessary
Func<string, int> f = x => int.Parse(x);
//this is enough
Func<string, int> f = int.Parse;
since writing anonymous iterator block is impossible.
Func<IEnumerable<int>> f = () => { yield return 0; }; //impossible
since recursive lambdas require one more line of quirkiness, like
Func<int, int> f = null;
f = x => (x <= 1) ? 1 : x * f(x - 1);
well, since reflection is kinda messier, but that is moot isn't it?
Apart from point 3, the rest are not strong reasons not to use lambdas.
Also see this thread about what is disadvantageous about Func/Action delegates, since often they are used along with lambda expressions.