As a LINQ-beginner I wonder why nobody mentioned in Implementing IEqualityComparer<T> for comparing arbitrary properties of any class (including anonymous) that the query actually has to be carried out to get the results. In other words, just calling
IEnumerable<Person> people = ... ; // some database call here
var distinctPeople = people.Distinct(new PropertyComparer<Person>("FirstName"));
will not trigger the execution of the specific Equals(Tx, Ty) and GetHashCode(T obj) methods in the PropertyComparer.
The message "Expanding the Results View will enumerate the IEnumerable" (in the Debugger) gave me the hint. Now, could I proceed with something like foreach (var dp in distinctPeople) to get the results?
This has nothing at all to do with the IEqualityComparer. It is entirely based on the method you're providing it to, in this case, Distinct. Distinct, along with all of the LINQ methods that return an IEnumerable, defer execution as much as possible, only performing the work that they need to compute the results when they actually need to do so.
Related
I ran across some code in an older project I am working on that I've never seen before, and has me confused on it's intent.
updatables.Select(r =>
{
// some operations are done here for each element in the list
return true;
}).ToArray();
It seems like a select statement is being used to iterate the updatables collection. Also seems the ToArray call isn't doing anything.
My question is, what does calling return true in the Select statement accomplish, if anything?
This looks very much like a hack to emulate ForEach:
ToArray() call is added to ensure that updatables will be iterated to completion,
return true is added to silence the compiler that does not allow Action<T>, but allows Func<T,bool> in LINQ's Select.
I would strongly recommend against writing code like this, because it is a lot less readable than an equivalent foreach loop.
Select takes a Func<T, TResult> - which means it won't accept an Action<T>. In other words, a lambda which does not return anything will result in a compilation error when passed to Select, so the author bypassed that "limitation" by having it return a dummy value.
The intent behind this code is likely to run a foreach loop on the collection using the LINQ syntax. However, the way it's done in this code is a bad practice, as LINQ methods are expected to be pure - that is, not modify any sort of state outside of the expression.
I'm using a delegate to hold a few methods that test a value and return a true/false result. After learning that a call to a delegate will only return the result of the last method in the delegate, I'm unsure of how to proceed.
I'd like to receive either the list of results from all the method calls in a delegate or if any of the calls returned true.
First I tried enumerating over the delegate with a foreach, which didn't work. I had to pull the methods out beforehand like so
System.Delegate[] methods = int_testers.GetInvocationList();
// Methods in int_testers returns true when a condition is met by the input value
Then enumerate with dynamic invokes on each member within 'methods'
foreach (var item in ds) {
if ((bool)item.DynamicInvoke(4))
return true;
}
However, I've read that DynamicInvoke is much slower (order of magnitude or more) than Invoke which is a trade off I'm not willing to make.
The alternative so far I've found is to have a list of Func<int,bool> and enumerate over those,
List<Func<int,bool>> methods = ....; // Add the methods into the list
foreach (var method in methods) {
if(method(4)) {
return true;
}
}
While this works, it seems like an issue delegates were made to solve. So, finally, is there a way to get a list of results from a delegate without simulating a delegate by hand?
This is essentially using the results of a map function in functional terms but I don't have enough C# experience to bring that idea nicely to what I'm doing.
I looked into LINQ a little for this and it seems it could work with the second method I've outlined though I can't seem to use LINQ with delegates in this case.
You could use Predicate<T> instead of delegate, and LINQ's Any() instead of a for loop:
var methods = new List<Predicate<T>>();
// add methods to list
return methods.Any(x => x(4));
Suppose we have a class Foo which has an Int32 Field Bar and you want to sort the collection of the Foo objects by the Bar Value. One way is to implement IComparable's CompareTo() method, but it can also be done using Language Integrated Query (LINQ) like this
List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar);
now in sortedFoos we have foos collection which is sorted. But if you use System.Diagnostics.StopWatch object to measure the time it took OrderBy() to sort the collection is always 0 milliseconds. But whenever you print sortedFoos collection it's obviously sorted. How is that possible. It takes literary no time to sort collection but after method execution the collection is sorted ? Can somebody explain to me how that works ? And also one thing. Suppose after sorting foos collection I added another element to it. now whenever I print out the collection the the element I added should be in the end right ? WRONG ! The foos collection will be sorted like, the element I added was the part of foos collection even if I add that element to foos after sorting. I don't quit understand how any of that work so can anybody make it clear for me ?!
Almost all LINQ methods use lazy evaluation - they don't immediately do anything, but they set up the query to do the right thing when you ask for the data.
OrderBy follows this model too - although it's less lazy than methods like Where and Select. When you ask for the first result from the result of OrderBy, it will read all the data from the source, sort all of it, and then return the first element. Compare this to Select for example, where asking for the first element from the projection only asks for the first element from the source.
If you're interested in how LINQ to Objects works behind the scenes, you might want to read my Edulinq blog series - a complete reimplementation of LINQ to Objects, with a blog post describing each method's behaviour and implementation.
(In my own implementation of OrderBy, I actually only sort lazily - I use a quick sort and sort "just enough" to return the next element. This can make things like largeCollection.OrderBy(...).First() a lot quicker.)
LINQ believe in deferred execution. This means the expression will only be evaluated when you started iterating or accessing the result.
The OrderBy extension uses the deault IComparer for the type it is working on, unless an alternative is passed via the appropriate overload.
The sorting work is defered until the IOrderedEnumerable<T> returned by your statement is first accessed. If you place the Stopwatch around that first access, you'll see how long sorting takes.
This makes a lot of sense, since your statement could be formed from multiple calls that return IOrderedEnumerables. As the ordering calls are chained, they fluently extend the result, allowing the finally returned IOrderedEnumerable to perform the sorting in the most expedient way possible. This is probably achieved by chaining all the IComparer calls and sorting once. A greedy implementation would have to wastefully sort multiple times.
For instance, consider
class MadeUp
{
public int A;
public DateTime B;
public string C;
public Guid D;
}
var verySorted = madeUps.OrderBy(m => m.A)
.ThenBy(m => m.B)
.ThenByDescending(m => m.C)
.ThenBy(m => m.D);
If verySorted was evaluated greedily, then every property in the sequence would be evaluated and the sequence would be reordered 4 times. Because the linq implementation of IOrderedEnumerable deferes sorting until enumeration, it is able to optimise the process.
The IComparers for A, B, C and D can be combined into a composite delegate, something like this simplified representation,
int result
result = comparerA(A, B);
if (result == 0)
{
result = comparerB(A, B);
if (result == 0)
{
result = comparerC(A, B);
if (result == 0)
{
result = comparerD(A, B);
}
}
}
return result;
the composite delegate is then used to sort the sequence once.
You have to add the ToList() to get a new collection. If you do't the OrderBy will be called when you start iterating on sortedFoos.
List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar).ToList();
When i use a standard Extension Method on a List such as
Where(...)
the result is always IEnumerable, and when
you decide to do a list operation such as Foreach()
we need to Cast(not pretty) or use a ToList() extension method that
(maybe) uses a new List that consumes more memory (is that right?):
List<string> myList=new List<string>(){//some data};
(Edit: this cast on't Work)
myList.Where(p=>p.Length>5).Tolist().Foreach(...);
or
(myList.Where(p=>p.Length>5) as List<string>).Foreach(...);
Which is better code or is there a third way?
Edit:
Foreach is a sample, Replace that with BinarySerach
myList.Where(p=>p.Length>5).Tolist().Binarysearch(...)
The as is definitely not a good approach, and I'd be surprised if it works.
In terms of what is "best", I would propose foreach instead of ForEach:
foreach(var item in myList.Where(p=>p.Length>5)) {
... // do something with item
}
If you desperately want to use list methods, perhaps:
myList.FindAll(p=>p.Length>5).ForEach(...);
or indeed
var result = myList.FindAll(p=>p.Length>5).BinarySearch(...);
but note that this does (unlike the first) require an additional copy of the data, which could be a pain if there are 100,000 items in myList with length above 5.
The reason that LINQ returns IEnumerable<T> is that this (LINQ-to-Objects) is designed to be composable and streaming, which is not possible if you go to a list. For example, a combination of a few where / select etc should not strictly need to create lots of intermediate lists (and indeed, LINQ doesn't).
This is even more important when you consider that not all sequences are bounded; there are infinite sequences, for example:
static IEnumerable<int> GetForever() {
while(true) yield return 42;
}
var thisWorks = GetForever().Take(10).ToList();
as until the ToList it is composing iterators, not generating an intermediate list. There are a few buffered operations, though, like OrderBy, which need to read all the data first. Most LINQ operations are streaming.
One of the design goals for LINQ is to allow composable queries on any supported data type, which is achieved by having return-types specified using generic interfaces rather than concrete classes (such as IEnumerable<T> as you noted). This allows the nuts and bolts to be implemented as needed, either as a concrete class (e.g. WhereEnumerableIterator<T> or hoisted into a SQL query) or using the convenient yield keyword.
Additionally, another design philosophy of LINQ is one of deferred execution. Basically, until you actually use the query, no real work has been done. This allows potentially expensive (or infinite as Mark notes) operations to be completed only exactly as needed.
If List<T>.Where returned another List<T> it would potentially limit composition and would certainly hinder deferred execution (not to mention generate excess memory).
So, looking back at your example, the best way to use the result of the Where operator depends on what you want to do with it!
// This assumes myList has 20,000 entries
// if .Where returned a new list we'd potentially double our memory!
var largeStrings = myList.Where(ss => ss.Length > 100);
foreach (var item in largeStrings)
{
someContainer.Add(item);
}
// or if we supported an IEnumerable<T>
someContainer.AddRange(myList.Where(ss => ss.Length > 100));
If you want to make a simple foreach over a list, you can do like this:
foreach (var item in myList.Where([Where clause]))
{
// Do something with each item.
}
You can't cast (as) IEnumerable<string> to List<string>. IEnumerable evaluates items when you access those. Invoking ToList<string>() will enumerate all items in the collection and returns a new List, which is a bit of memory inefficiency and as well as unnecessary. If you are willing to use ForEach extension method to any collection its better to write a new ForEach extension method that will work on any collection.
public static void ForEach<T>(this IEnumerable<T> enumerableList, Action<T> action)
{
foreach(T item in enumerableList)
{
action(item);
}
}
I have a C# method which accepts a Predicate<Foo> and returns a list of matching items...
public static List<Foo> FindAll( Predicate<Foo> filter )
{
...
}
The filter will often be one of a common set...
public static class FooPredicates
{
public static readonly Predicate<Foo> IsEligible = ( foo => ...)
...
}
...but may be an anonymous delegate.
I'd now like to have this method cache its results in the ASP.NET cache, so repeated calls with the same delegate just return the cached result. For this, I need to create a cache key from the delegate. Will Delegate.GetHashCode() produce sensible results for this purpose? Is there some other member of Delegate that I should look at? Would you do this another way entirely?
To perform your caching task, you can follow the other suggestions and create a Dictionary<Predicate<Foo>,List<Foo>> (static for global, or member field otherwise) that caches the results. Before actually executing the Predicate<Foo>, you would need to check if the result already exists in the dictionary.
The general name for this deterministic function caching is called Memoization - and its awesome :)
Ever since C# 3.0 added lambda's and the swag of Func/Action delegates, adding Memoization to C# is quite easy.
Wes Dyer has a great post that brings the concept to C# with some great examples.
If you want me to show you how to do this, let me know...otherwise, Wes' post should be adequate.
In answer to your query about delegate hash codes. If two delegates are the same, d1.GetHashCode() should equal d2.GetHashCode(), but I'm not 100% about this. You can check this quickly by giving Memoization a go, and adding a WriteLine into your FindAll method. If this ends up not being true, another option is to use Linq.Expression<Predicate<Foo>> as a parameter. If the expressions are not closures, then expressions that do the same thing should be equal.
Let me know how this goes, I'm interested to know the answer about delegate.Equals.
Delegate equality looks at each invocation in the invocation list, testing for equality of method to be invoked, and target of method.
The method is a simple piece of the cache key, but the target of the method (the instance to call it on - assuming an instance method) could be impossible to cache in a serializable way. In particular, for anonymous functions which capture state, it will be an instance of a nested class created to capture that state.
If this is all in memory, just keeping the delegate itself as the hash key will be okay - although it may mean that some objects which clients would expect to be garbage collected hang around. If you need to serialize this to a database, it gets hairier.
Could you make your method accept a cache key (e.g. a string) as well? (That's assuming an in memory cache is inadequate.)
Keeping the cached results in a Dictionary<Predicate<Foo>,List<Foo>> is awkward for me because I want the ASP.NET cache to handle expiry for me rather than caching all results forever, but it's otherwise a good solution. I think I'll end up going with Will's Dictionary<Predicate<Foo>,string> to cache a string that I can use in the ASP.NET cache key.
Some initial tests suggest that delegate equality does the "right thing" as others have said, but Delegate.GetHashCode is pathologically unhelpful. Reflector reveals
public override int GetHashCode()
{
return base.GetType().GetHashCode();
}
So any Predicate<Foo> returns the same result.
My remaining issue was how equality works for anonymous delegates. What does "same method called on the same target" mean then? It seems that as long as the delegate was defined in the same place, references are equal. Delegates with the same body defined in different places are not.
static Predicate<int> Test()
{
Predicate<int> test = delegate(int i) { return false; };
return test;
}
static void Main()
{
Predicate<int> test1 = Test();
Predicate<int> test2 = Test();
Console.WriteLine(test1.Equals( test2 )); // True
test1 = delegate(int i) { return false; };
test2 = delegate(int i) { return false; };
Console.WriteLine(test1.Equals( test2 )); // False
}
This should be OK for my needs. Calls with the predefined predicates will be cached. Multiple calls to one method that calls FindAll with an anonymous method should get cached results. Two methods calling FindAll with apparently the same anonymous method won't share cached results, but this should be fairly rare.
Unless you're sure Delegate's implementation of GetHashCode is deterministic and doesn't result in any collisions I wouldn't trust it.
Here's two ideas. First, store the results of the delegates within a Predicate/List dictionary, using the predicate as the key, and then store the entire dictionary of results under a single key in the cache. Bad thing is that you lose all your cached results if the cache item is lost.
An alternative would be to create an extension method for Predicate, GetKey(), that uses an object/string dictionary to store and retrieve all keys for all Predicates. You index into the dictionary with the delegate and return its key, creating one if you don't find it. This way you're assured that you are getting the correct key per delegate and there aren't any collisions. A naiive one would be type name + Guid.
The same instance of an object will always return the same hashcode (requirement of GetHashCode() in .Net). If your predicates are inside a static list and you are not redefining them each time, I can't see a problem in using them as keys.