I have the presumably common problem of having elements that I wish to place in 2 (or more) lists. However sometimes I want to find an element that could be in one of the lists. Now there is more than one way of doing this eg using linq or appending, but all seem to involve the unnecessary creation of an extra list containing all the elements of the separate lists and hence waste processing time.
So I was considering creating my own generic FindinLists class which would take 2 lists as its constructor parameters would provide a Find() and an Exists() methods. The Find and Exists methods would only need to search the second or subsequent lists if the item was not found in the first list. The FindInLists class could be instantiated in the getter of a ( no setter)property. A second constructor for the FindInLists class could take an array of lists as its parameter.
Is this useful or is there already a way to search multiple lists without incurring the wasteful overhead of the creation of a super list?
You could use the LINQ Concat function.
var query = list1.Concat(list2).Where(x => x.Category=="my category");
Linq already has this functionality by virtue of the FirstOrDefault method. It uses deferred execution so will stream from any input and will short circuit the return when a matching element is found.
var matched = list1.Concat(list2).FirstOrDefault(e => element.Equals(e));
Update
BaseType matched = list1.Concat(list2).Concat(list3).FirstOrDefault(e => element.Equals(e));
I believe IEnumerable<T>.Concat() is what you need. It doesn't create an extra list, it only iterates through the given pair of collections when queried
Concat() uses deferred execution, so at the time it's called it only creates an iterator which stores the reference to both concatenated IEnumerables. At the time the resulting collection is enumerated, it iterates through first and then through the second.
Here's the decompiled code for the iterator - no rocket science going on there:
private static IEnumerable<TSource> ConcatIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second)
{
foreach (TSource iteratorVariable0 in first)
{
yield return iteratorVariable0;
}
foreach (TSource iteratorVariable1 in second)
{
yield return iteratorVariable1;
}
}
When looking to the docs for Concat(), I've stumbled across another alternative I didn't know - SelectMany. Given a collection of collections it allows you to work with the children of all parent collections at once like this:
IEnumerable<string> concatenated = new[] { firstColl, secondColl }
.SelectMany(item => item);
you can do something like this:
var list1 = new List<int>{1,2,3,4,5,6,7};
var list2 = new List<int>{0,-3,-4,2};
int elementToPush = 4;//value to find among available lists
var exist = list1.Exists(i=>i==elementToPush) || list2.Exists(j=>j==elementToPush);
If at least one collection required element exists, result is false, otherwise it's true.
One row and no external storage creation.
Hope this helps.
You could probably just create a List of lists and then use linq on that list. It is still creating a new List but it is a list of references rather than duplicating the contents of all the lists.
List<string> a = new List<string>{"apple", "aardvark"};
List<string> b = new List<string>{"banana", "bananananana", "bat"};
List<string> c = new List<string>{"cat", "canary"};
List<string> d = new List<string>{"dog", "decision"};
List<List<string>> super = new List<List<string>> {a,b,c,d};
super.Any(x=>x.Contains("apple"));
the Any call should return after the first list returns true so as requested will not process later lists if it finds it in an earlier list.
Edit: Having written this I prefer the answers using Concat but I leave this here as an alternative if you want something that might be more aesthetically pleasing. ;-)
Related
I was trying to clone an List in my code, because I needed to output that List to some other code, but the original reference was going to be cleared later on. So I had the idea of using the Select extension method to create a new reference to an IEnumerable of the same elements, for example:
List<int> ogList = new List<int> {1, 2, 3};
IEnumerable<int> enumerable = ogList.Select(s => s);
Now after doing ogList.Clear(), I was surprised to see that my new enumerable was also empty.
So I started fiddling around in LINQPad, and saw that even if my Select returned different objects entirely, the behaviour was the same.
List<int> ogList = new List<int> {1, 2, 3};
IEnumerable<int> enumerable = ogList.Select(s => 5); // Doesn't return the original int
enumerable.Count().Dump(); // Count is 3
ogList.Clear();
enumerable.Count().Dump(); // Count is 0!
Note that in LINQPad, the Dump()s are equivalent to Console.WriteLine().
Now probably my need to clone the list in the first place was due to bad design, and even if I didn't want to rethink the design I could easily clone it properly. But this got me thinking about what the Select extension method actually does.
According to the documentation for Select:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
So then I tried adding this code before clearing:
foreach (int i in enumerable)
{
i.Dump();
}
The result was still the same.
Finally, I tried one last thing to figure out if the reference in my new enumerable was the same as the old one. Instead of clearing the original List, I did:
ogList.Add(4);
Then I printed out the contents of my enumerable (the "cloned" one), expecting to see '4' appended to the end of it. Instead, I got:
5
5
5
5 // Huh?
Now I have no choice but to admit that I have no idea how the Select extension method works behind the scenes. What's going on?
List/List<T> are for all intents and purposes fancy resizable arrays. They own and hold the data for value types such as your ints or references to the data for reference types in memory and they always know how many items they have.
IEnumerable/IEnumerable<T> are different beasts. They provide a different service/contract. An IEnumerable is fictional, it does not exist. It can create data out of thin air, with no physical backing. Their only promise is that they have a public method called GetEnumerator() that returns an IEnumerator/IEnumerator<T>. The promise that an IEnumerator makes is simple:
some item could be available or not at a time when you decide you need it. This is achieved through a simple method that the IEnumerator interface has: bool MoveNext() - which returns false when the enumeration is completed or true if there was in fact a new item that needed to be returned. You can read the data through a property that the IEnumerator interface has, conveniently called Current.
To get back to your observations/question: as far as the IEnumerable in your example is concerned, it does not even think about the data unless your code tells it to fetch some data.
When you are writing:
List<int> ogList = new List<int> {1, 2, 3};
IEnumerable<int> enumerable = ogList.Select(s => s);
You are saying: Listen here IEnumerable, I might come to you asking for some items at some point in the future. I'll tell you when I will need them, for now sit still and do nothing. With Select(s => s) you are conceptually defining an identity projection of int to int.
A very rough simplified, non-real-life implementation of the select you've written is:
IEnumerable<T> Select(this IEnumerable<int> source, Func<int,T> transformer) something like
{
foreach (var i in source) //create an enumerator for source and starts enumeration
{
yield return transformer(i); //yield here == return an item and wait for orders
}
}
(this explains why you got a 5 when expecting a for, your transform was s => 5)
For value types, such as the ints in your case: If you want to clone the list, clone the whole list or part of it for future enumeration by using the result of an enumeration materialized through a List. This way you create a list that is a clone of the original list, entirely detached from its original list:
IEnumerable<int> cloneOfEnumerable = ogList.Select(s => s).ToList();
Later edit: Of course ogList.Select(s => s) is equivalent to ogList. I'm leaving the projection here, as it was in the question.
What you are creating here is: a list from the result of an enumerable, further consumed through the IEnumerable<int> interface. Considering what I've said above about the nature of IList vs IEnumerable, I would prefer to write/read:
IList<int> cloneOfEnumerable = ogList.ToList();
CAUTION: Be careful with reference types. IList/List make no promise of keeping the objects "safe", they can mutate to null for all IList cares. Keyword if you ever need it: deep cloning.
CAUTION: Beware of infinite or non-rewindable IEnumerables
Provided answers explain why you are not obtaining a cloned list (due to deferred execution of some LINQ extension methods).
However, keep in mind that list.Select(e => e).ToList() will get a real clone only when dealing with value types such as int.
If you have a list of reference types you will receive a cloned list of references to existent objects. In this case you should consider one of the solutions provided here for deep-cloning or my favorite from here (which might be limited by object inner structure).
You have to be aware that an object that implements IEnumerable does not have to be a collection itself. It is an object that makes it possible to get an object that implements IEnumerator. Once you have the enumerator you can ask for the first element and for the next element until there are no more next elements.
Every LINQ function that returns an IEnumerable is not the sequence itself, it only enables you to ask for the enumerator. If you want a sequence, you'll have to use ToList.
There are several other LINQ functions that do not return an IEnumerable, but for instance a Dictionary, or only one element (FirstOrDefault(), Max(), Single(), Any(). These functions will get the enumerator from the IEnumerable and start enumerating until they have the result. Any will only have to check if you can start enumerating. Max will enumerate over all elements and remember the largest one. etc.
You'll have to be aware: as long as your LINQ statement is an IEnumerable of something, your source sequence is not accessed yet. If you change your source sequence before you start enumerating, the enumeration is over your changed source sequence.
If you don't want this, you'll have to do the enumeration before you change your source. Usually this will be ToList, but this can be any of the non-deferred function: Max(), Any(), FirstOrDefault(), etc.
List<TSource> sourceItems = ...
var myEnumerable = sourceItems
.Where(sourceItem => ...)
.GroupBy(sourceItem => ...)
.Select(group => ...);
// note: myEnumerable is an IEnumerable, it is not a sequence yet.
var list1 = sourceItems.ToList(); // Enumerate over the sequence
var first = sourceItems.FirstOrDefault(); // Enumerate and stop after the first
// now change the source, and to the same things again
sourceItems.Clear();
var list1 = sourceItems.ToList(); // returns empty list
var first = sourceItems.FirstOrDefault(); // return null: there is no first element
So every LINQ function that does not return IEnumerable, will start enumerating over sourceItems as the sequence is at the moment that you start enumerating. The IEnumerable is not the sequence itself.
This is an enumerable.
var enumerable = ogList.Select(s => s);
If you iterate through this enumerable, LINQ will in turn iterate over the original resultset. Each and every time. If you do anything to the original enumerable, the results will also be reflected in your LINQ calls.
If you need to freeze the data, store it in a list instead:
var enumerable = ogList.Select(s => s).ToList();
Now you've made a copy. Iterating over this list will not touch the original enumerable.
I need to filter a large list of complex (20+ properties) objects into multiple sub lists. To create the sub-lists, I have a list of filter specifications. Requirements are: a) An item is not allowed to be part of two sub lists and b) it must be possible to get hold of all undivided items after processing has finished.
Currently I use the following algorithm:
List item
Put the objects to be filtered in a Generic List
For each filter specification:
Create a Where expression (Expression>)
Apply the expression using Linq > Where to the list of objects
Get the resulting IEnumerable of selected objects and store them in a list, together with the description of the filter
Remove the items found from the source list using Linq > Except to create a new list to continue working with and to prevent an object from being put in more than one sub list
Check whether there a still (undivided) objects in the working list
My initial list of objects can be over 400.000 objects and I've noticed that both the filtering, as well as reducing the working list takes some time. So I would like to know:
Filtering to create the sub-lists takes place on a maximum of 7 properties of my object. Is there a way to improve performance of a Linq > Where selection?
Is there a way to prevent items from being selected into multiple sub-lists, without reducing the working collection by using Except or RemoveAll (possible improvement)?
Thanks in advance!
If you can not leverage any indexes in the incoming list you are trying to classify then you are better off just iterating through the whole list only once and classifying the items as you go. This way you avoid unnecessary remove and except operations that are seriously hurting the performance with pointless iterations and equality comparisons.
I was thinking about something a long the lines of:
public static IDictionary<string, List<T>> Classify<T>(this IEnumerable<T> items, IDictionary<string, Predicate<T>> predicates, out List<T> defaultBucket)
{
var classifiedItems = new Dictionary<string, List<T>>(predicates.Count);
defaultBucket = new List<T>();
foreach (var predicate in predicates)
{
classifiedItems.Add(predicate.Key, new List<T>());
}
foreach (var item in items)
{
var matched = false;
foreach (var predicate in predicates)
{
if (predicate.Value(item))
{
matched = true;
classifiedItems[predicate.Key].Add(item);
break;
}
}
if (!matched)
{
defaultBucket.Add(item);
}
}
return classifiedItems;
}
Any given predicate can be as complex as you need it to be. Only condition is that it takes in a T and returns a bool. If that is not enough, nothing is preventing you from implementing your own MyPredicate<???> with whatever signature you need.
EDIT: Edited the code to handle a "default bucket" where items that don't comply with any of the specified predicates go.
I have a generic list that is being used inside a method that's being called 4 times. This method writes a table in a PDF with the values of this generic list.
My problem is that I need to reverse this generic list inside the method, but I'm calling the method 4 times so the list is being reversed every time I call the method and I don't want that... what can I do? Is there a way to reverse the list without mutating the original?
This is inside the method:
t.List.Reverse();
foreach (string t1 in t.List)
{
//Some code
}
The "easy" option would be to just iterate the list in reverse order without actually changing the list itself instead of trying to reverse it the first time and know to do nothing the other times:
foreach (string t1 in t.List.AsEnumerable().Reverse())
{
//Some code
}
By using the LINQ Reverse method instead of the List Reverse, we can iterate it backwards without mutating the list. The AsEnumerable needs to be there to prevent the List Reverse method from being used.
You're asking how to create a reversed copy of the list, without mutating the original.
LINQ can do that:
foreach (var t1 in Enumerable.Reverse(t.List))
You can do one of two things. Either take out reversing the list from the method and reverse the list once before calling the method four times, or do:
List<My_Type> new_list = new List<Int32>(t.List);
new_list.Reverse();
That takes a copy of the list before reversing it so you don't touch the original list.
I would recomend the first approach because right now you are calling Reverse four times instead of just once.
All the given options until now internally copy all elements to another list in reverse order, explicit as new_list.Reverse() or implicit t.List.AsEnumerable().Reverse(). I prefer something that has not this cost as the extension method below.
public class ListExtetions {
public static IEnumerable<T> GetReverseEnumerator(this List<T> list) {
for (int i = list.Count - 1; i >= 0; i--)
return yeild list[i];
}
}
And could be used as
foreach (var a in list.GetReverseEnumerator()) {
}
Suppose we have a class Foo which has an Int32 Field Bar and you want to sort the collection of the Foo objects by the Bar Value. One way is to implement IComparable's CompareTo() method, but it can also be done using Language Integrated Query (LINQ) like this
List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar);
now in sortedFoos we have foos collection which is sorted. But if you use System.Diagnostics.StopWatch object to measure the time it took OrderBy() to sort the collection is always 0 milliseconds. But whenever you print sortedFoos collection it's obviously sorted. How is that possible. It takes literary no time to sort collection but after method execution the collection is sorted ? Can somebody explain to me how that works ? And also one thing. Suppose after sorting foos collection I added another element to it. now whenever I print out the collection the the element I added should be in the end right ? WRONG ! The foos collection will be sorted like, the element I added was the part of foos collection even if I add that element to foos after sorting. I don't quit understand how any of that work so can anybody make it clear for me ?!
Almost all LINQ methods use lazy evaluation - they don't immediately do anything, but they set up the query to do the right thing when you ask for the data.
OrderBy follows this model too - although it's less lazy than methods like Where and Select. When you ask for the first result from the result of OrderBy, it will read all the data from the source, sort all of it, and then return the first element. Compare this to Select for example, where asking for the first element from the projection only asks for the first element from the source.
If you're interested in how LINQ to Objects works behind the scenes, you might want to read my Edulinq blog series - a complete reimplementation of LINQ to Objects, with a blog post describing each method's behaviour and implementation.
(In my own implementation of OrderBy, I actually only sort lazily - I use a quick sort and sort "just enough" to return the next element. This can make things like largeCollection.OrderBy(...).First() a lot quicker.)
LINQ believe in deferred execution. This means the expression will only be evaluated when you started iterating or accessing the result.
The OrderBy extension uses the deault IComparer for the type it is working on, unless an alternative is passed via the appropriate overload.
The sorting work is defered until the IOrderedEnumerable<T> returned by your statement is first accessed. If you place the Stopwatch around that first access, you'll see how long sorting takes.
This makes a lot of sense, since your statement could be formed from multiple calls that return IOrderedEnumerables. As the ordering calls are chained, they fluently extend the result, allowing the finally returned IOrderedEnumerable to perform the sorting in the most expedient way possible. This is probably achieved by chaining all the IComparer calls and sorting once. A greedy implementation would have to wastefully sort multiple times.
For instance, consider
class MadeUp
{
public int A;
public DateTime B;
public string C;
public Guid D;
}
var verySorted = madeUps.OrderBy(m => m.A)
.ThenBy(m => m.B)
.ThenByDescending(m => m.C)
.ThenBy(m => m.D);
If verySorted was evaluated greedily, then every property in the sequence would be evaluated and the sequence would be reordered 4 times. Because the linq implementation of IOrderedEnumerable deferes sorting until enumeration, it is able to optimise the process.
The IComparers for A, B, C and D can be combined into a composite delegate, something like this simplified representation,
int result
result = comparerA(A, B);
if (result == 0)
{
result = comparerB(A, B);
if (result == 0)
{
result = comparerC(A, B);
if (result == 0)
{
result = comparerD(A, B);
}
}
}
return result;
the composite delegate is then used to sort the sequence once.
You have to add the ToList() to get a new collection. If you do't the OrderBy will be called when you start iterating on sortedFoos.
List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar).ToList();
When i use a standard Extension Method on a List such as
Where(...)
the result is always IEnumerable, and when
you decide to do a list operation such as Foreach()
we need to Cast(not pretty) or use a ToList() extension method that
(maybe) uses a new List that consumes more memory (is that right?):
List<string> myList=new List<string>(){//some data};
(Edit: this cast on't Work)
myList.Where(p=>p.Length>5).Tolist().Foreach(...);
or
(myList.Where(p=>p.Length>5) as List<string>).Foreach(...);
Which is better code or is there a third way?
Edit:
Foreach is a sample, Replace that with BinarySerach
myList.Where(p=>p.Length>5).Tolist().Binarysearch(...)
The as is definitely not a good approach, and I'd be surprised if it works.
In terms of what is "best", I would propose foreach instead of ForEach:
foreach(var item in myList.Where(p=>p.Length>5)) {
... // do something with item
}
If you desperately want to use list methods, perhaps:
myList.FindAll(p=>p.Length>5).ForEach(...);
or indeed
var result = myList.FindAll(p=>p.Length>5).BinarySearch(...);
but note that this does (unlike the first) require an additional copy of the data, which could be a pain if there are 100,000 items in myList with length above 5.
The reason that LINQ returns IEnumerable<T> is that this (LINQ-to-Objects) is designed to be composable and streaming, which is not possible if you go to a list. For example, a combination of a few where / select etc should not strictly need to create lots of intermediate lists (and indeed, LINQ doesn't).
This is even more important when you consider that not all sequences are bounded; there are infinite sequences, for example:
static IEnumerable<int> GetForever() {
while(true) yield return 42;
}
var thisWorks = GetForever().Take(10).ToList();
as until the ToList it is composing iterators, not generating an intermediate list. There are a few buffered operations, though, like OrderBy, which need to read all the data first. Most LINQ operations are streaming.
One of the design goals for LINQ is to allow composable queries on any supported data type, which is achieved by having return-types specified using generic interfaces rather than concrete classes (such as IEnumerable<T> as you noted). This allows the nuts and bolts to be implemented as needed, either as a concrete class (e.g. WhereEnumerableIterator<T> or hoisted into a SQL query) or using the convenient yield keyword.
Additionally, another design philosophy of LINQ is one of deferred execution. Basically, until you actually use the query, no real work has been done. This allows potentially expensive (or infinite as Mark notes) operations to be completed only exactly as needed.
If List<T>.Where returned another List<T> it would potentially limit composition and would certainly hinder deferred execution (not to mention generate excess memory).
So, looking back at your example, the best way to use the result of the Where operator depends on what you want to do with it!
// This assumes myList has 20,000 entries
// if .Where returned a new list we'd potentially double our memory!
var largeStrings = myList.Where(ss => ss.Length > 100);
foreach (var item in largeStrings)
{
someContainer.Add(item);
}
// or if we supported an IEnumerable<T>
someContainer.AddRange(myList.Where(ss => ss.Length > 100));
If you want to make a simple foreach over a list, you can do like this:
foreach (var item in myList.Where([Where clause]))
{
// Do something with each item.
}
You can't cast (as) IEnumerable<string> to List<string>. IEnumerable evaluates items when you access those. Invoking ToList<string>() will enumerate all items in the collection and returns a new List, which is a bit of memory inefficiency and as well as unnecessary. If you are willing to use ForEach extension method to any collection its better to write a new ForEach extension method that will work on any collection.
public static void ForEach<T>(this IEnumerable<T> enumerableList, Action<T> action)
{
foreach(T item in enumerableList)
{
action(item);
}
}