C# Lambda performance issues/possibilities/guidelines - c#

I'm testing performance differences using various lambda expression syntaxes. If I have a simple method:
public IEnumerable<Item> GetItems(int point)
{
return this.items.Where(i => i.IsApplicableFor(point));
}
then there's some variable lifting going on here related to point parameter because it's a free variable from lambda's perspective. If I would call this method a million times, would it be better to keep it as it is or change it in any way to improve its performance?
What options do I have and which ones are actually feasible? As I understand it is I have to get rid of free variables so compiler won't have to create closure class and instantiate it on every call to this method. This instantiation usually takes significant amount of time compared to non-closure versions.
The thing is I would like to come up with some sort of lambda writing guidelines that would generally work, because it seems I'm wasting some time every time I write a heavily hit lambda expression. I have to manually test it to make sure it will work, because I don't know what rules to follow.
Alternative method
& example console application code
I've also written a different version of the same method that doesn't need any variable lifting (at least I think it doesn't, but you guys who understand this let me know if that's the case):
public IEnumerable<Item> GetItems(int point)
{
Func<int, Func<Item, bool>> buildPredicate = p => i => i.IsApplicableFor(p);
return this.items.Where(buildPredicate(point));
}
Check out Gist here. Just create a console application and copy the whole code into Program.cs file inside namespace block. You will see that the second example is much much slower even though it doesn't use free variables.
A contradictory example
The reason why I would like to construct some lambda best usage guidelines is that I've met this problem before and to my surprise that one turned out to be working faster when a predicate builder lambda expression was used.
Now explain that then. I'm completely lost here because it may as well turn out I won't be using lambdas at all when I know I have some heavy use method in my code. But I would like to avoid such situation and get to the bottom of it all.
Edit
Your suggestions don't seem to work
I've tried implementing a custom lookup class that internally works similar to what compiler does with a free variable lambda. But instead of having a closure class I've implemented instance members that simulate a similar scenario. This is the code:
private int Point { get; set; }
private bool IsItemValid(Item item)
{
return item.IsApplicableFor(this.Point);
}
public IEnumerable<TItem> GetItems(int point)
{
this.Point = point;
return this.items.Where(this.IsItemValid);
}
Interestingly enough this works just as slow as the slow version. I don't know why, but it seems to do nothing else than the fast one. It reuses the same functionality because these additional members are part of the same object instance. Anyway. I'm now extremely confused!
I've updated Gist source with this latest addition, so you can test for yourself.

What makes you think that the second version doesn't require any variable lifting? You're defining the Func with a Lambda expression, and that's going to require the same bits of compiler trickery that the first version requires.
Furthermore, you're creating a Func that returns a Func, which bends my brain a little bit and will almost certainly require re-evaluation with each call.
I would suggest that you compile this in release mode and then use ILDASM to examine the generated IL. That should give you some insight into what code is generated.
Another test that you should run, which will give you more insight, is to make the predicate call a separate function that uses a variable at class scope. Something like:
private DateTime dayToCompare;
private bool LocalIsDayWithinRange(TItem i)
{
return i.IsDayWithinRange(dayToCompare);
}
public override IEnumerable<TItem> GetDayData(DateTime day)
{
dayToCompare = day;
return this.items.Where(i => LocalIsDayWithinRange(i));
}
That will tell you if hoisting the day variable is actually costing you anything.
Yes, this requires more code and I wouldn't suggest that you use it. As you pointed out in your response to a previous answer that suggested something similar, this creates what amounts to a closure using local variables. The point is that either you or the compiler has to do something like this in order to make things work. Beyond writing the pure iterative solution, there is no magic you can perform that will prevent the compiler from having to do this.
My point here is that "creating the closure" in my case is a simple variable assignment. If this is significantly faster than your version with the Lambda expression, then you know that there is some inefficiency in the code that the compiler creates for the closure.
I'm not sure where you're getting your information about having to eliminate the free variables, and the cost of the closure. Can you give me some references?

Your second method runs 8 times slower than the first for me. As #DanBryant says in comments, this is to do with constructing and calling the delegate inside the method - not do do with variable lifting.
Your question is confusing as it reads to me like you expected the second sample to be faster than the first. I also read it as the first is somehow unacceptably slow due to 'variable lifting'. The second sample still has a free variable (point) but it adds additional overhead - I don't understand why you'd think it removes the free variable.
As the code you have posted confirms, the first sample above (using a simple inline predicate) performs jsut 10% slower than a simple for loop - from your code:
foreach (TItem item in this.items)
{
if (item.IsDayWithinRange(day))
{
yield return item;
}
}
So, in summary:
The for loop is the simplest approach and is "best case".
The inline predicate is slightly slower, due to some additional overhead.
Constructing and calling a Func that returns Func within each iteration is significantly slower than either.
I don't think any of this is surprising. The 'guideline' is to use an inline predicate - if it performs poorly, simplify by moving to a straight loop.

I profiled your benchmark for you and determined many things:
First of all, it spends half its time on the line return this.GetDayData(day).ToList(); calling ToList. If you remove that and instead manually iterate over the results, you can measure relative the differences in the methods.
Second, because IterationCount = 1000000 and RangeCount = 1, you are timing the initialization of the different methods rather than the amount of time it takes to execute them. This means your execution profile is dominated by creating the iterators, escaping variable records, and delegates, plus the hundreds of subsequent gen0 garbage collections that result from creating all that garbage.
Third, the "slow" method is really slow on x86, but about as fast as the "fast" method on x64. I believe this is due to how the different JITters create delegates. If you discount the delegate creation from the results, the "fast" and "slow" methods are identical in speed.
Fourth, if you actually invoke the iterators a significant number of times (on my computer, targetting x64, with RangeCount = 8), "slow" is actually faster than "foreach" and "fast" is faster than all of them.
In conclusion, the "lifting" aspect is negligible. Testing on my laptop shows that capturing a variable like you do requires an extra 10ns every time the lambda gets created (not every time it is invoked), and that includes the extra GC overhead. Furthermore, while creating the iterator in your "foreach" method is somewhat faster than creating the lambdas, actually invoking that iterator is slower than invoking the lambdas.
If the few extra nanoseconds required to create delegates is too much for your application, consider caching them. If you require parameters to those delegates (i.e. closures), consider creating your own closure classes such that you can create them once and then just change the properties when you need to reuse their delegates. Here's an example:
public class SuperFastLinqRangeLookup<TItem> : RangeLookupBase<TItem>
where TItem : RangeItem
{
public SuperFastLinqRangeLookup(DateTime start, DateTime end, IEnumerable<TItem> items)
: base(start, end, items)
{
// create delegate only once
predicate = i => i.IsDayWithinRange(day);
}
DateTime day;
Func<TItem, bool> predicate;
public override IEnumerable<TItem> GetDayData(DateTime day)
{
this.day = day; // set captured day to correct value
return this.items.Where(predicate);
}
}

When a LINQ expression that uses deferred execution executes within the same scope that encloses the free variables it references, the compiler should detect that and not create a closure over the lambda, because it's not needed.
The way to verify that would be by testing it using something like this:
public class Test
{
public static void ExecuteLambdaInScope()
{
// here, the lambda executes only within the scope
// of the referenced variable 'add'
var items = Enumerable.Range(0, 100000).ToArray();
int add = 10; // free variable referenced from lambda
Func<int,int> f = x => x + add;
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
static Func<int,int> GetExpression()
{
int add = 10;
return x => x + add; // this needs a closure
}
static void ExecuteLambdaOutOfScope()
{
// here, the lambda executes outside the scope
// of the referenced variable 'add'
Func<int,int> f = GetExpression();
var items = Enumerable.Range(0, 100000).ToArray();
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
}

Related

Is it faster to get a property or to pass a value to a method

I have a static class with properties to store user's inputs:
public static class UserData
{
public static double UserInput1 { get; set; }
}
And I have nested methods that need the user's inputs
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar();
[...]
}
private static double Bar()
{
var input1 = UserData.UserInput1;
[...]
}
The positive thing is that I do not have to pass all user inputs to Foo(), then to Bar() (and to further nested methods within Bar()).
The negative thing is that I have to get UserData.UserInput1 and other user inputs very often. I could change the code to get the user inputs only once:
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar(input1);
[...]
}
private static double Bar(double input1)
{
[...]
}
Which one is faster?
The second one is the faster than the first one. Because you avoid to obtain the static property from UserData.
It's not a big goal works with static when we talk about performance cost due to the need to perform a lookup in the symbol table and track shared memory. By passing input values as parameters, this is avoided and slightly better performance is achieved.
But both options are ok. It's more important to focus on code readability and maintainability rather than performance unless you are working on a critical performance issue.
Which one is faster?
Using static mutable state in this way will be way slower in the long run. Because you will spend a bunch of time trying to find and fix bugs. This time could be better spent doing things that will actually help performance, like profiling and optimizing code.
Try to make method that compute anything take the required input as parameters. Try to make input fields properties of the associated UI class. This should help keep the code simple and understandable.
Accessing a static property will be translated to a indirect memory access. Passing a parameter to a method might be free if the parameter is already in a register, or might involve a bit more work if it needs to be loaded, moved or passed on the stack. But we are talking about single digit cycles here, optimization on this level should only be done in super tight loops that are run many millions of times each second, and then you should typically ensure that all methods can be inlined, side stepping the problem.
If you're worried about such micro-optimizations (which you generally wouldn't need), consider using inlining.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.methodimploptions?view=net-7.0
PS: using one over the other, or using AggressiveInlining, will not save you anywhere close to the 1ms you are hoping for, under non-extreme/farfetched scenarios.

C# Lambda expression performance when using instance variables

I just red JetBrains article about boosting performance when using lambda expressions: Unusual Ways of Boosting Up App Performance. Lambdas and LINQs.
I was wondering does using instance variable cause the same kind of decrease in performance?
In the article it was told that the next kind of code using a refenrence to a variable in the same method scope inside a lambda expression would slow performance:
public void DoSomething(IEnumerable<Something> myIEnumerable){
DateTime yesterDay = DateTime.Now.AddDays(-1);
return myIEnumerable.Where(obj => obj.Expired>yesterDay);
}
But what if I access instance variables. Would it have the same negative kind of effect?:
class SomeClass{
private DateTime _yesterDay;
public void Foo(IEnumerable<SomeThing> myIEnumerable){
_yesterDay = DateTime.Now.AddDays(-1);
IEnumerable<SomeThing> someThings = DoSomeThing(myIEnumerable);
//...
}
public IEnumerable<Something> DoSomething(IEnumerable<Something> myIEnumerable){
return myIEnumerable.Where(obj => obj.Expired>_yesterDay);
}
}
Thanks
In the jetBrains article, they are right on fact that instanciating delegate on each call can cost and is required only when context is not constant.
Closure is required to capture instance (or local variable) and cannot be natively cached. If you use only delegate arguments, constants and static members, closure is not requiered.
In another hand, the cost can affect only a massively called code (hot path) like big batch or code made for foundation (architects concern).
In most of case, taking this overhead to change design seems to be a premature optimization that can really cost for maintaining.

Could locking an enumerable potentially cause multiple enumeration?

I think ReSharper is lying to me.
I have this extension method that (hopefully) returns an xor of two enumerations:
public static IEnumerable<T> Xor<T>(this IEnumerable<T> first, IEnumerable<T> second)
{
lock (first)
{
lock (second)
{
var firstAsList = first.ToList();
var secondAsList = second.ToList();
return firstAsList.Except(secondAsList).Union(secondAsList.Except(firstAsList));
}
}
}
ReSharper thinks I'm performing a multiple enumeration of an IEnumerable, as you can see, on both the arguments. If I remove the locks, then it's satisfied that I'm not.
Is ReSharper right or wrong? I believe it's wrong.
edit: I do realize that I'm enumerating the lists multiple times, but ReSharper is saying I'm enumerating over the original arguments multiple times, which I don't think is true. I'm enumerating both arguments once into a list so I may then perform the actual set manipulation, but as I see it, I'm not actually iterating over the arguments passed multiple times.
For example, if the passed arguments are actually query results, my belief is this method won't cause a storm of queries to be executed by the set manipulation. I do understand what ReSharper means by warning of multiple enumeration: if the enumerables passed are heavy to generate, then if they're enumerated multiple times, then performing multiple enumerations on them will be much slower.
Also, removing the locks definitely makes ReSharper happier:
You are indeed enumerating both of the lists multiple times. You are not enumerating the enumerables passed as parameters multiple times. Both lists are enumerated once for each call to Except. The call to Union is not enumerating either sequence an additional time, but rather is enumerating the results of the two calls to Except.
Of course, iterating a List multiple times in a context like this isn't really a problem; there aren't negative consequences to iterating an unchanging list multiple times.
The lock statements have nothing whatsoever to do with enumeration of the sequences. Locking on an IEnumerable does not iterate it. Of course, locking on two objects like this, specifically two objects that are not limited in scope to this section of code, is very dangerous. It's quite possible to end up deadlocking the program with locks used in this manor if code elsewhere (such as another invocation of this method) ends up taking locks on the same objects in the opposite order).
This is a bit of a funny one.
First things first: as you've correctly identified, R# is raising this inspection not against the multiple usages of the Lists - there is of course nothing to worry about in multiply enumerating a List - but against (what R' sees as multiple usages of) the IEnumerable arguments. I'm presuming you already know why this would be potentially bad, so I'll skip that.
Now to the question of whether R# is right to complain here. To quote the C# spec,
A lock statement of the form
lock (x) ...
where x is an expression of a reference-type, is precisely
equivalent to
System.Threading.Monitor.Enter(x);
try {
...
}
finally {
System.Threading.Monitor.Exit(x);
}
except that x is only evaluated once.
(I've put in this emphasis because I like this wording; it avoids debates (that I'm definitely not qualified to enter) about whether this is "syntatic sugar" or not.)
Taking a minimal example which produces this R# inspection:
private static void Method(IEnumerable<int> enumerable)
{
lock (enumerable)
{
var list = enumerable.ToList();
}
}
and replacing it by what I think is the precisely equivalent version as mandated by the spec:
private static void Method(IEnumerable<int> enumerable)
{
var x = enumerable;
System.Threading.Monitor.Enter(x);
try
{
var list = enumerable.ToList();
}
finally
{
System.Threading.Monitor.Exit(x);
}
}
also produces the inspection.
The question then is: is R# right to produce this inspection? And this is where I think we get into a grey area. When I pass the following enumerable to either of these methods:
static IEnumerable<int> MyEnumerable()
{
Console.WriteLine("Enumerable enumerated");
yield return 1;
yield return 2;
}
it is not multiply enumerated, which would suggest that R# is wrong to warn here; however, I can't actually find anything in documentation that guarantees this behaviour of either lock or Monitor.Enter. So for me it's not quite as clear-cut as this R# bug I reported, where use of GetType flagged this inspection; but nonetheless I'd guess you're safe.
If you raise this on the R# bug tracker, you can get JetBrains' finest looking at a) whether this behaviour is indeed guaranteed. and b) whether R# can be adjusted to either not warn, or provide a justification for warning.
That said, of course, using locking here probably isn't actually achieving what you want to achieve, as stated in other answers and comments...

Using properties vs. methods for calculating changing values

Is there a convention for whether or not to use a property to calculate a value on call? For instance if my class contains a list of integers and I have a property Average, the average will possibly change when an integer is added/removed/modified from the list, does doing something like this:
private int? _ave = null;
public int Average
{
get
{
if (_ave == null )
{
double accum = 0;
foreach (int i in myList)
{
accum += i;
}
_ave = accum / myList.Count;
return (int)_ave;
}
else
{
return (int)_ave;
}
}
}
where _ave is set to null if myList is modified in a way that may change the average...
Have any conventional advantage/disadvantage over a method call to average?
I am basically just wondering what the conventions are for this, as I am creating a class that has specific properties that may only be calculated once. I like the idea of the classes that access these properties to be able to access the property vs. a method (as it seems more readable IMO, to treat something like average as a property rather than a method), but I can see where this might get convoluted, especially in making sure that _ave is set to null appropriately.
The conventions are:
If the call is going to take significantly more time than simply reading a field and copying the value in it, make it a method. Properties should be fast.
If the member represents an action or an ability of the class, make it a method.
If the call to the getter mutates state, make it a method. Properties are invoked automatically in the debugger, and it is extremely confusing to have the debugger introducing mutations in your program as you debug it.
If the call is not robust in the face of being called at unusual times then make it a method. Properties need to continue to work when in used in constructors and finalizers, for example. Again, think about the debugger; if you are debugging a constructor then it should be OK for you to examine a property in the debugger even if it has not actually been initialized yet.
If the call can fail then make it a method. Properties should not throw exceptions.
In your specific case, it is borderline. You are performing a potentially lengthy operation the first time and then caching the result, so the amortized time is likely to be very fast even if the worst-case time is slow. You are mutating state, but again, in quite a non-destructive way. It seems like you could characterize it as a property of a set rather than an "ability" of the set. I would personally be inclined to make this a method but I would not push back very hard if you had a good reason to make it a property.
Regarding your specific implementation: I would be much more inclined to use a 64 bit integer as the accumulator rather than a 64 bit double; the double only has 53 bits of integer precision compared to the 64 bits of a long.
Microsoft's recommendation to using methods:
Use method
If calling has side effects
If it returns different values each calls
If it takes long time to call
If operation requires parameters (except indexers)
Use property if calculated value is attribute of object.
In your case I think property with implicit lazy calculation would be good choice.
Yes there is... a get accessor should not in any way modify the state of the object. The returned value could be calculated of course, and you might have a ton of code in there. But simply accessing a value should not affect the state of the containing instance at all.
In this particular case, why not calculate everything upon construction of the class instance instead? Or provide a dedicated method to force the class to do so.
Now I suppose there might be very specific situations where that sort of behavior is OK. This might be one of those. But without seeing the rest of the code (and the way it is used), it's impossible to tell.

Does a lambda create a new instance everytime it is invoked?

I'm curious to know whether a Lambda (when used as delegate) will create a new instance every time it is invoked, or whether the compiler will figure out a way to instantiate the delegate only once and pass in that instance.
More specifically, I'm wanting to create an API for an XNA game that I can use a lambda to pass in a custom call back. Since this will be called in the Update method (which is called many times per second) it would be pretty bad if it newed up an instance everytime to pass in the delegate.
InputManager.GamePads.ButtonPressed(Buttons.A, s => s.MoveToScreen<NextScreen>());
Yes, it will cache them when it can:
using System;
class Program {
static void Main(string[] args) {
var i1 = test(10);
var i2 = test(20);
System.Console.WriteLine(object.ReferenceEquals(i1, i2));
}
static Func<int, int> test(int x) {
Func<int, int> inc = y => y + 1;
Console.WriteLine(inc(x));
return inc;
}
}
It creates a static field, and if it's null, populates it with a new delegate, otherwise returns the existing delegate.
Outputs 10, 20, true.
I was interested by your question because I had just assumed that this kind of thing would always generate a new object and hence to be avoided in code which is called frequently.
I do something similar so I thought I would use ildasm to find out what exactly is going on behind the scenes. In my case it turned out that a new object was getting created each time the delegate was called, I won't post my code because it is fairly complex and not very easy to understand out of context. This conflicts with the answer provided by MichaelGG, I suspect because in his example he makes use of static functions. I would suggest you try it for yourself before designing everything one way and later on finding out that you have a problem. ildasm is the way to go (http://msdn.microsoft.com/en-us/library/f7dy01k1.aspx), look out for any "newobj" lines, you don't want those.
Also worth using CLR Profile to find out if your lambda functions are allocating memory (https://github.com/MicrosoftArchive/clrprofiler). It says it's for framework 2.0 but it also works for 3.5 and it's the latest version that is available.

Categories