I have a C# method which accepts a Predicate<Foo> and returns a list of matching items...
public static List<Foo> FindAll( Predicate<Foo> filter )
{
...
}
The filter will often be one of a common set...
public static class FooPredicates
{
public static readonly Predicate<Foo> IsEligible = ( foo => ...)
...
}
...but may be an anonymous delegate.
I'd now like to have this method cache its results in the ASP.NET cache, so repeated calls with the same delegate just return the cached result. For this, I need to create a cache key from the delegate. Will Delegate.GetHashCode() produce sensible results for this purpose? Is there some other member of Delegate that I should look at? Would you do this another way entirely?
To perform your caching task, you can follow the other suggestions and create a Dictionary<Predicate<Foo>,List<Foo>> (static for global, or member field otherwise) that caches the results. Before actually executing the Predicate<Foo>, you would need to check if the result already exists in the dictionary.
The general name for this deterministic function caching is called Memoization - and its awesome :)
Ever since C# 3.0 added lambda's and the swag of Func/Action delegates, adding Memoization to C# is quite easy.
Wes Dyer has a great post that brings the concept to C# with some great examples.
If you want me to show you how to do this, let me know...otherwise, Wes' post should be adequate.
In answer to your query about delegate hash codes. If two delegates are the same, d1.GetHashCode() should equal d2.GetHashCode(), but I'm not 100% about this. You can check this quickly by giving Memoization a go, and adding a WriteLine into your FindAll method. If this ends up not being true, another option is to use Linq.Expression<Predicate<Foo>> as a parameter. If the expressions are not closures, then expressions that do the same thing should be equal.
Let me know how this goes, I'm interested to know the answer about delegate.Equals.
Delegate equality looks at each invocation in the invocation list, testing for equality of method to be invoked, and target of method.
The method is a simple piece of the cache key, but the target of the method (the instance to call it on - assuming an instance method) could be impossible to cache in a serializable way. In particular, for anonymous functions which capture state, it will be an instance of a nested class created to capture that state.
If this is all in memory, just keeping the delegate itself as the hash key will be okay - although it may mean that some objects which clients would expect to be garbage collected hang around. If you need to serialize this to a database, it gets hairier.
Could you make your method accept a cache key (e.g. a string) as well? (That's assuming an in memory cache is inadequate.)
Keeping the cached results in a Dictionary<Predicate<Foo>,List<Foo>> is awkward for me because I want the ASP.NET cache to handle expiry for me rather than caching all results forever, but it's otherwise a good solution. I think I'll end up going with Will's Dictionary<Predicate<Foo>,string> to cache a string that I can use in the ASP.NET cache key.
Some initial tests suggest that delegate equality does the "right thing" as others have said, but Delegate.GetHashCode is pathologically unhelpful. Reflector reveals
public override int GetHashCode()
{
return base.GetType().GetHashCode();
}
So any Predicate<Foo> returns the same result.
My remaining issue was how equality works for anonymous delegates. What does "same method called on the same target" mean then? It seems that as long as the delegate was defined in the same place, references are equal. Delegates with the same body defined in different places are not.
static Predicate<int> Test()
{
Predicate<int> test = delegate(int i) { return false; };
return test;
}
static void Main()
{
Predicate<int> test1 = Test();
Predicate<int> test2 = Test();
Console.WriteLine(test1.Equals( test2 )); // True
test1 = delegate(int i) { return false; };
test2 = delegate(int i) { return false; };
Console.WriteLine(test1.Equals( test2 )); // False
}
This should be OK for my needs. Calls with the predefined predicates will be cached. Multiple calls to one method that calls FindAll with an anonymous method should get cached results. Two methods calling FindAll with apparently the same anonymous method won't share cached results, but this should be fairly rare.
Unless you're sure Delegate's implementation of GetHashCode is deterministic and doesn't result in any collisions I wouldn't trust it.
Here's two ideas. First, store the results of the delegates within a Predicate/List dictionary, using the predicate as the key, and then store the entire dictionary of results under a single key in the cache. Bad thing is that you lose all your cached results if the cache item is lost.
An alternative would be to create an extension method for Predicate, GetKey(), that uses an object/string dictionary to store and retrieve all keys for all Predicates. You index into the dictionary with the delegate and return its key, creating one if you don't find it. This way you're assured that you are getting the correct key per delegate and there aren't any collisions. A naiive one would be type name + Guid.
The same instance of an object will always return the same hashcode (requirement of GetHashCode() in .Net). If your predicates are inside a static list and you are not redefining them each time, I can't see a problem in using them as keys.
Related
I'm curious to understand what's happening behind the scenes in a situation like this:
public static void OuterMethod() {
// some random code
var a = 42;
var b = "meaning of life";
Func<string, object> factory = (aString) {
// do something with a and b
return "Hello World";
};
// some more random code
factory("My string");
}
I'm particularly interested in cases where OuterMethod is called very often. In my case it's the MVC request pipeline where OuterMethod is called once for each request.
Am I incurring a lot overhead by having to build factory each time the method is called? I could easily move the Func outside of OuterMethod into its own static method however in my actual scenario because it is defined inside I have access to a lot of variables I need to do my calculation that I would otherwise need to include in the signature of a method defined outside. Maybe this is just a micro optimization but I would like to better understand how the compiler treats these kinds of statements.
The actual lambda is going to result in a new named method (you just don't know what that name is) being created at compile time (the exact semantics will vary based on some of the specifics).
The only work being done on each invocation of the method is the creation of a new delegate object that has its own pointer to the same named method. If constructing that one object instance is really too much for you (hint: its not) then you could save that work by extracting the delegate out of the method.
This must be a duplicate but i haven't found it. I've found this question which is related since it answers why it's recommended to use a method group instead of a lambda.
But how do i use an existing method group instead of a lambda if the method is not in the current class and the method is not static?
Say i have a list of ints which i want to convert to strings, i can use List.ConvertAll, but i need to pass a Converter<int, string> to it:
List<int> ints = new List<int> { 1 };
List<string> strings = ints.ConvertAll<string>(i => i.ToString());
This works, but it creates an unnecessary anonymous method with the lambda. So if Int32.ToString would be static and would take an int i could write:
List<string> strings = ints.ConvertAll<string>(Int32.ToString);
But that doesn't compile - of course. So how can i use a method group anyway?
If i'd create an instance method like this
string FooInt(int foo)
{
return foo.ToString();
}
i could use strings = ints.ConvertAll<string>(FooInt);, but that is not what i want. I don't want to create a new method just to be able to use an existing.
There is an static method in the framework, that can be used to convert any integrated data type into a string, namely Convert.ToString:
List<int> ints = new List<int> { 1 };
List<string> strings = ints.ConvertAll<string>(Convert.ToString);
Since the signature of Convert.ToString is also known, you can even eliminate the explicit target type parameter:
var strings = ints.ConvertAll(Convert.ToString);
This works. However, I'd also prefer the lambda-expression, even if ReSharper tells you something different. ReSharper sometimes optimizes too much imho. It prevents developers from thinking about their code, especially in the aspect of readability.
Update
Based on Tim's comment, I will try to explain the difference between lambda and static method group calls in this particular case. Therefor, I first took a look into the mscorlib disassembly to figure out, how int-to-string conversion exactly works. The Int32.ToString method calls an external method within the Number-class of the System namespace:
[__DynamicallyInvokable, TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries"), SecuritySafeCritical]
public string ToString(IFormatProvider provider)
{
return Number.FormatInt32(this, null, NumberFormatInfo.GetInstance(provider));
}
The static Convert.ToString member does nothing else than calling ToString on the parameter:
[__DynamicallyInvokable]
public static string ToString(int value)
{
return value.ToString(CultureInfo.CurrentCulture);
}
Technically there would be no difference, if you'd write your own static member or extension, like you did in your question. So what's the difference between those two lines?
ints.ConvertAll<string>(i => i.ToString());
ints.ConvertAll(Convert.ToString);
Also - technically - there is no difference. The first example create's an anonymous method, that returns a string and accepts an integer. Using the integer's instance, it calls it's member ToString. The second one does the same, with the exception that the method is not anonymous, but an integrated member of the framework.
The only difference is that the second line is shorter and saves the compiler a few operations.
But why can't you call the non-static ToString directly?
Let's take a look into the ConvertAll-method of List:
public List<TOutput> ConvertAll<TOutput>(Converter<T, TOutput> converter)
{
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
List<TOutput> list = new List<TOutput>(this._size);
for (int i = 0; i < this._size; i++)
{
list._items[i] = converter(this._items[i]);
}
list._size = this._size;
return list;
}
The list iteraterates over each item, calls the converter with the item as an argument and copys the result into a new list which it returns in the end.
So the only relation here is your converter that get's called explicitly. If you could pass Int32.ToString to the method, the compiler would have to decide to call this._items[i].ToString() within the loop. In this specific case it would work, but that's "too much intelligence" for the compiler. The type system does not support such code conversions. Instead the converter is an object, describing a method that can be called from the scope of the callee. Either this is an existing static method, like Convert.ToString, or an anonymous expression, like your lambda.
What causes the differences in your benchmark results?
That's hard to guess. I can imagine two factors:
Evaluating lambdas may result in runtime-overhead.
Framework calls may be optimized.
The last point especially means, that the JITer is able to inline the call which results in a better performance. However, those are just assumptions of mine. If anyone could clarify this, I'd appreciate it! :)
You hit the nail on the head yourself:
This works, but it creates an unnecessary anonymous method with the
lambda.
You can't do what you're asking for because there is no appropriate method group that you can use so the anonymous method is necessary. It works in that other case because the implicit range variable is passed to the delegate created by the method group. In your case, you need the method to be called on the range variable. It's a completely different scenario.
I have a method which takes in a constructor arguments as
Expression<func<T>>
thus declared as
object[] _constructorArgs = () => new MyClass("StringParam", 56, "ThirdParam");
I would instantiate this object as;
Activator.CreateInstance(typeof(MyClass), _constructorArgs);
Which works fine, but it's a bit slow.
Is there a way I can gather the types Constructor based on the contents of _constructorArgs so I would be able to call
ConstructorInfo.Invoke(_constructorArgs);
?
I know the Invoke is possible this way, it's just about finding the correct constructor based on the parameters in _constructorArgs.
Edit - For clarity
Apologies, I was very tired when I first asked this and should have given it more thought.
What I am doing is the following;
public object Create<T>(Expression<Func<T>> constructor)
{
//I didn't include this step in the original code
var constructorArguments =
(((NewExpression)constructor.Body).Arguments.Select(
argument => Expression.Lambda(argument).Compile().DynamicInvoke())).ToArray();
object[] _args = constructorArguments;
return Activator.CreateInstance(typeof(T), _args);
}
However, if I do the following instead;
ConstructorInfo c = type.GetConstructors().FirstOrDefault();
//Get the types constructor
return c.Invoke(_args);
I get better performance, I'm talking the first taking about 2800 milliseconds over a million iterations, using Invoke bringing that down to about 1000 milliseconds, thus 2.8 times faster.
This will work great if the first constructor always matches the arguments given, but this won't always be the case.
I want to know how I can get the correct ConstructorInfo based on arguments given.
If a million new objects per second isn't fast enough for you, you're going to have to go deeper. You need to start caching things. The simplest thing to cache is the constructor itself, so that you don't have to search for the correct constructor all the time. However...
Why are you doing this? Why don't you simply call the lambda outright? You've got all the code to instantiate the class, and then you throw it away and use Activator.CreateInstance? Why? Even if you do that, you don't need to search for the constructor - NewExpression.Constructor has the ConstructorInfo you need. Just do
((NewExpression)constructor.Body).Constructor.Invoke(_args)
and you're done, no searching needed. All the metadata is already there in the expression tree.
Please explain why you can't simply do return constructor(); (with caching if possible / needed - it's handy to pass things as lambda parameters, since you can then easily cache the method itself).
Activator.CreateInstance() under the hood calls GetConstructors() and iterates over them to find the matching ones. This would explain the difference in performance - and if you roll your own implementation chances are you will end up with the same or worse performance.
You could simplify the process by comparing types using parameterType.IsAssignableFrom(argType), and return the first match - you may end up using a different constructor than Activator.CreateInstance() though because it uses the best match, not the first match:
class DerivedClass : BaseClass { }
class Test
{
public Test(BaseClass c)
{
}
public Test(DerivedClass c)
{
}
}
// Uses the most specific constructor, Test(DerivedClass):
Activator.CreateInstance(typeof(Test), new DerivedClass());
Might be a bit late, but I found a nice implementation with caching, which claims to be 70 times faster thant Activator.CreateInstance;
this.Item = (T)Activator.CreateInstance(typeof(T), new Object[] { x, y }, null); // Classical approach
this.Item = Constructor<Func<int,int,T>>.Ctor(x,y); // Dynamic constructor approach
The complete implementation can be found here:
http://www.cyberforum.ru/blogs/32756/blog2078.html
Returning multiple things from a method, involves either:
returning an object with properties OR
using the out keyword to simply modify incoming parameters
Is there a benefit to using one system or the other? I have been using objects, but just discovered the out keyword, so wondering if I should bother refactoring.
You shouldn't bother refactoring just to utilize out parameters. Returning a class or struct would be preferred as long as structure is reusable.
A common use for out parameters which I would suggest using is to return a status for a call with that is possible to fail. An example being int.TryParse.
It has the possibility of failing, so returning a bool makes it easy to determing whether or not you should use the out parameter.
Another possible solution to returning multiple values from a method would be to use a Tuple. They can return n number of results. E.g.
public Tuple<bool, bool, string> MyMethod()
{
return new Tuple<bool, bool, string>(false, true, "yep");
}
In general, if the object that you are returning is not used anywhere else outside of the return value of your method or a group of similar methods, it is a good indication that you should refactor. When you need to create a special class simply to be used as a return value of a method, it means that you are working around C#'s inability to return multiple values from a method, so the out keyword may be a very good option for you.
On the other hand, if you use the multi-part return value in other places, such as storing them in collections or passing as arguments to other methods, there's probably no need to refactor, because the return object is meaningful.
Compare these two methods:
interface DictionaryReturn<T> {
T Value {get;}
bool Success {get;}
}
...
class Dictionary<K,V> {
...
public DictionaryReturn<V> TryGetValue(K key) {
...
}
}
or
class Dictionary<K,V> {
...
public bool TryGetValue(K key, out V res) {
...
}
}
The first case introduces a special DictionaryReturn<T> class that provides the value and an indicator that the value was found in the dictionary. There is rarely, if ever, a reason to store or use DictionaryReturn<T> objects outside the call to TryGetValue, so the second option is better. Not surprisingly, it is the second option that the designers of the .NET collections library have implemented.
I prefer to use Object with properties. If you use out keyword, you need to define it in other line. It is not as clear as return Object;
The reason to use out keyword is to ensure that code inside the method always sets a value to the out parameter. It's a compile time check that what you intended to do in the function, you did do.
Consider the following example code:
static void Main(string[] args)
{
bool same = CreateDelegate(1) == CreateDelegate(1);
}
private static Action CreateDelegate(int x)
{
return delegate { int z = x; };
}
You would imagine that the two delegate instances would compare to be equal, just as they would when using the good old named method approach (new Action(MyMethod)). They do not compare to be equal because the .NET Framework provides a hidden closure instance per delegate instance. Since those two delegate instances each have their Target properties set to their individual hidden instance, they do not compare. One possible solution is for the generated IL for an anonymous method to store the current instance (this pointer) in the target of the delegate. This will allow the delegates to compare correctly, and also helps from a debugger standpoint since you will see your class being the target, instead of a hidden class.
You can read more about this issue in the bug I submitted to Microsoft. The bug report also gives an example of why we are using this functionality, and why we feel it should be changed. If you feel this to be an issue as well, please help support it by providing rating and validation.
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=489518
Can you see any possible reason why the functionality should not be changed? Do you feel this was the best course of action to get the issue resolved, or do you recommend that I should take a different route?
I'm not so inclined to think this is a "bug". It appears moreover that you're assuming some behaviour in the CLR that simply does not exist.
The important thing to understand here is that you are returning a new anonymous method (and initialising a new closure class) each time you call the CreateDelegate method. It seems that you are experting the delegate keyword to use some sort of pool for anonymous methods internally. The CLR certainly does not do this. A delegate to the anonymous method (as with a lambda expression) is created in memory each time you call the method, and since the equality operator does of course compare references in this situation, it is the expected result to return false.
Although your suggested behaviour may have some benefits in certain contexts, it would probably be quite complicated to implement, and would more likely lead to unpredictable scenarios. I think the current behaviour of generating a new anonymous method and delegate on each call is the right one, and I suspect this is the feedback you will get on Microsoft Connect as well.
If you are quite insistent on having the behaviour you described in your question, there is always the option of memoizing your CreateDelegate function, which would insure that the same delegate is returned each time for the same parameters. Indeed, because this is so easy to implement, it is probably one of the several reasons why Microsoft did not consider implementing it in the CLR.
EDIT: Old answer left for historical value below the line...
The CLR would have to work out the cases in which the hidden classes could be considered equal, taking into account anything that could be done with the captured variables.
In this particular case, the captured variable (x) isn't changed either within the delegate or in the capturing context - but I'd rather the language didn't require this sort of complexity of analysis. The more complicated the language is, the harder it is to understand. It would have to distinguish between this case and the one below, where the captured variable's value is changed on each invocation - there, it makes a great deal of difference which delegate you call; they are in no way equal.
I think it's entirely sensible that this already-complex situation (closures are frequently misunderstood) doesn't try to be too "clever" and work out potential equality.
IMO, you should definitely take a different route. These are conceptually independent instances of Action. Faking it by coercing the delegate targets is a horrible hack IMO.
The problem is that you're capturing the value of x in a generated class. The two x variables are independent, so they're unequal delegates. Here's an example demonstrating the independence:
using System;
class Test
{
static void Main(string[] args)
{
Action first = CreateDelegate(1);
Action second = CreateDelegate(1);
first();
first();
first();
first();
second();
second();
}
private static Action CreateDelegate(int x)
{
return delegate
{
Console.WriteLine(x);
x++;
};
}
}
Output:
1
2
3
4
1
2
EDIT: To look at it another way, your original program was the equivalent of:
using System;
class Test
{
static void Main(string[] args)
{
bool same = CreateDelegate(1) == CreateDelegate(1);
}
private static Action CreateDelegate(int x)
{
return new NestedClass(x).ActionMethod;
}
private class Nested
{
private int x;
internal Nested(int x)
{
this.x = x;
}
internal ActionMethod()
{
int z = x;
}
}
}
As you can tell, two separate instances of Nested will be created, and they will be the targets for the two delegates. They are unequal, so the delegates are unequal too.
I don't know about the C# specific details of this problem but I worked on the VB.Net equivalent feature which has the same behavior.
The bottom line is this behavior is "By Design" for the following reasons
The first is that in this scenario a closure is unavoidable. You have used a piece of local data within an anonymous method and hence a closure is necessary to capture the state. Every call to this method must create a new closure for a number of reasons. Therefore each delegate will point to an instance method on that closure.
Under the hood a anonymous method / expression is represented by a System.MulticastDelegate derived instance in code. If you look at the Equals method of this class you will notice 2 important details
It is sealed so there is no way for a derived delegate to change the equals behavior
Part of the Equals method does a reference comparison on the objects
This makes it impossible for 2 lambda expressions which are attached to different closures to compare as equals.
I can't think of a situation where I've ever needed to do that. If I need to compare delegates I always use named delegates, otherwise something like this would be possible:
MyObject.MyEvent += delegate { return x + y; };
MyObject.MyEvent -= delegate { return x + y; };
This example isn't great for demonstrating the issue, but I would imagine that there could be a situation where allowing this could break existing code that was designed with the expectation that this is not allowed.
I'm sure there are internal implementation details that also make this a bad idea, but I don't know exactly how anonymous methods are implemented internally.
This behaviour makes sense because otherwise anonymous methods would get mixed up (if they had the same name, given the same body).
You could change your code to this:
static void Main(){
bool same = CreateDelegate(1) == CreateDelegate(1);
}
static Action<int> action = (x) => { int z = x; };
private static Action<int> CreateDelegate(int x){
return action;
}
Or, preferably, since that's a bad way to use it (plus you were comparing the result, and Action doesn't have a return value ... use Func<...> if you want to return a value):
static void Main(){
var action1 = action;
var action2 = action;
bool same = action1 == action2; // TRUE, of course
}
static Action<int> action = (x) => { int z = x; };