This question already has answers here:
C# LINQ First() faster than ToArray()[0]?
(10 answers)
Closed 8 years ago.
I am wondering what happens under the hood of list.first() and list[0] and which performs better.
For example which is faster?
for(int i = 0; i < 999999999999... i++)
{
str.Split(';').First() vs. str.Split(';')[0]
list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]
}
Sorry In case of a duplicate question
Which performs better? The array accessor, since that doesn't need a method to be put on the stack and doesn't have to execute the First method to eventually get to the array accessor.
As the Reference Source of Enumerable shows, First() is actually:
IList<TSource> list = source as IList<TSource>;
if (list != null) {
if (list.Count > 0) return list[0];
}
So it doesn't do anything else, it just takes more steps to get there.
For your second part (list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]):
Where returns an IEnumerable, which isn't IList so it doesn't go for the first part of the First method but the second part:
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return e.Current;
}
This will traverse each item one by one until it gets the desired index. In this case 0, so it will get there very soon. The other one calling ToList will always be less efficient since it has to create a new object and put all items in there in order to get the first one. The first call is definitely faster.
Simple compaction of performance http://pastebin.com/bScgyDaM
str.Split(';').First(); : 529103
str.Split(';')[0]; : 246753
list.Where(x => x == "a").First(); : 98590
list.Where(x => x == "a").ToList()[0]; : 230858
First vs [0]
if you have simple array faster is [0] because it only calculating adders in memory.
but if you combine with others LINQ command faster is First(). for example Where().First() searching until he finds first element. Where().ToList()[0] finds all elements then convert to list and do a simple calculation.
another thing is that Where() is an deferred method. A query that contains only deferred methods is not executed until the items in the result are enumerated.
so you can
list.Where( x => x>12);
list.add(10);
list.add(13);
foreach (int item in list)
{
Console.WriteLine(item);
}
13 will attach to result but 10 no because 10 and 13 were first added to the list later list was searched.
If you want to know more about Linq you can read that book Pro LINQ by Joseph Rattz and Adam Freeman http://www.apress.com/9781430226536
There is no significant difference between these, you will get pretty much the same results:
str.Split(';').First() vs. str.Split(';')[0]
For your second comparison, here you are asking only the first element
list.Where(x => x > 1).First()
So as soon as WhereIterator returns an item it's done. But in second you are putting all results into list then getting the first item using indexer , therefore it will be slower.
list.Where(x => x > 1).First() vs. list.Where(x => x > 1).ToList()[0]
The First() should be faster when applied to Enumerable because of
deferred execution. In your case, the result will be returned as soon
as one item of your list has been found that match the criteria
Where(x => x > 1).
In the second example, your initial list has to be fully enumerated,
ALL items matching the criteria will be put in a temporary list, of which you get the first item with the array accessor.
str.Split(';').First() vs. str.Split(';')[0]
In that case the method Split() already returns an array. The array accessor might be marginally faster, but the performance gain will be negligible in most cases.
Related
This question already has answers here:
Using LINQ to remove elements from a List<T>
(14 answers)
c# Remove items in a custom list, based on another List<int>
(4 answers)
Filter List By items in Array (LINQ)
(1 answer)
Closed 1 year ago.
today I use this to get a list of persons that is not in list A but in list B. It works but seem to take a very long time to get the result. Is there a faster way that do the same?
var missingPeople = listofPersons.Where(p => allUsedPersons.All(p2 => p2.Id != p.Id)).ToList();
Your current implementation has O( n * m ) time complexity.
Where n is the cardinality of listofPersons.
Where m is the cardinality of allUsedPersons.
So if you have 500 listofPersons and 200 allUsedPersons your code will take 100,000 checks. That is bad.
This is because Linq's Where will run for every item in listofPersons, and inside your Where you have allUsedPersons.All, which will run the p2.Id != p.Id check for every item in allUsedPersons.
Instead, use a HashSet<T> to build a set of known values in O(n) time - which then lets you perform exists checks in O(1) time.
So if you have 500 listofPersons and 200 allUsedPersons my code below will take only 500 checks.
100,000 vs 500: spot the difference.
HashSet<Int32> allPeopleIds = listofPersons.Select( p => p.Id ).ToHashSet();
List<Person> missingPeople = allUsedPersons
.Where( p => !allPeopleIds.Contains( p.Id ) )
.ToList();
In relational-algebra (or is it relational-calculus?) what you're doing is known as an anti-join and Linq supports it via the Except method, however you would need to define a custom-comparator as Linq doesn't yet have an ExceptBy method (but MoreLinq does, though).
Another option is to provide a custom, reusable IEqualityComparer<Person>:
public class PersonIdComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
return x?.Id == y?.Id;
}
public int GetHashCode(Person obj)
{
return obj?.Id ?? int.MinValue;
}
}
You can use it for many LINQ methods. In this case you should use it for Except:
var missingPeople = listofPersons.Except(allUsedPersons, new PersonIdComparer()).ToList();
Except is quite efficient since it uses a collection similar to HashSet.
I think below method can help you for improve performance:
List<int> listOfIds = allUsedPersons.select(c => c.Id).ToList();
var missingPeople = listofPersons.Where(r=>!listOfIds.Contains(r.Id));
Putting something like allUsedPersons.All() within predicate will unnecessarily multiply the number of iterations. Prepare the list of required field(here p.id) beforehand and use it inside predicate.
Assuming allUsedPersons is list A, below would be faster
var usedPersonsId = allUsedPersons.Select(p => p.id).ToList();
var missingPeople = listofPersons.Where(p => !usedPersonsId.Contains(p.Id)).ToList();
I am new to LINQ and discovered yesterday that you can have multiple where clauses such as:
var items = from object in objectList
where object.value1 < 100
where object.value2 > 10
select object;
Or you can write:
var items = from object in objectList
where object.value1 < 100
&& object.value2 > 10
select object;
What is the difference between the two?
The first one will be translated into:
objectList.Where(o => o.value1 < 100).Where(o=> o.value2 > 10)
while the second one will be translated in:
objectList.Where(o => o.value1 < 100 && o.value2 > 10)
So, in the first one, you will have a first filtered sequence that is filtered again (first sequence contains all the objects with value < 100, the second one containing all the objects with value > 10 from the first sequence), in while the second one you will do the same comparisons in the same labda expression. This is valid fro Linq to objects, for other providers it depends how the expression is translated.
The marked answer gets it a little bit inaccurate.
As #Philippe said, the first one will be translated into:
objectList.Where(o => o.value1 < 100).Where(o=> o.value2 > 10)
while the second one will be translated in:
objectList.Where(o => o.value1 < 100 && o.value2 > 10)
But Linq has a little optimization for chained Where calls.
If you inspect Linq's source code you will see the following:
class WhereEnumerableIterator<TSource> : Iterator<TSource>
{
public override IEnumerable<TSource> Where(Func<TSource, bool> predicate)
{
return new WhereEnumerableIterator<TSource>(source,
CombinePredicates(this.predicate, predicate));
}
}
What CombinePredicates does is is combining the two predicates with && between them:
static Func<TSource, bool> CombinePredicates<TSource>(Func<TSource, bool> predicate1,
Func<TSource, bool> predicate2)
{
return x => predicate1(x) && predicate2(x);
}
So objectList.Where(X).Where(Y) is equivalent to objectList.Where(X && Y) except for the creation time of the query (Which is extremely short anyway) and the invocation of two predicates.
Bottom line is that it does not filter or iterate the collection two times - but one composite time.
The first one translates to:
objectList.Where(o => o.value1 < 100)
.Where(o => o.value2 > 10);
while the latter gets you:
objectList.Where(o => o.value1 < 100 && o.value2 > 10);
It's functionally the same, and while the second one would spare a method call, the difference in performance is negligible. Use what's more readable for you.
That is, if you're using LINQ to Objects. If you're using a provider, it depends on how it's implemented (if the predicate is not factored in the resulting query, the result can be sub-optimal).
I've just profile it.
No difference in SQL code
At the most basic level, you get two Where operations instead of one. Using Reflector is the best way to examine what comes out the other end of a query expression.
Whether they get optimised down to the same thing depends on the actual LINQ provider - it needs to take the entire tree and convert it to another syntax. For LINQ To Objects, it doesn't.
C# in Depth is good to give you an understanding of this topic.
How about this for an answer: with && you can't guarantee that both expressions with be evaluated (if the first condition is false then the second might not be evaluated). With the two where clauses then you can. No idea whether it's true but it sounds good to me!
All other things being equal, I would choose the condition1 && condition2 version, for the sake of code readability.
Given the following code:
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue);
outValue = 3;
//enumerating over someEnumerable here shows ints from 0 to 99
I am able to see a "snapshot" of the out parameter for each iteration. Why does this work correctly instead of me seeing 100 3's (deferred execution) or 100 99's (access to modified closure)?
First you define a query, strings that knows how to generate a sequence of strings, when queried. Each time a value is asked for it will generate a new number and convert it to a string.
Then you declare a variable, outValue, and assign 0 to it.
Then you define a new query, someEnumerable, that knows how to, when asked for a value, get the next value from the query strings, try to parse the value and, if the value can be parsed, yields the value of outValue. Once again, we have defined a query that can do this, we have not actually done any of this.
You then set outValue to 3.
Then you ask someEnumerable for it's first value, you are asking the implementation of Select for its value. To compute that value it will ask the Where for its first value. The Where will ask strings. (We'll skip a few steps now.) The Where will get a 0. It will call the predicate on 0, specifically calling int.TryParse. A side effect of this is that outValue will be set to 0. TryParse returns true, so the item is yielded. Select then maps that value (the string 0) into a new value using its selector. The selector ignores the value and yields the value of outValue at that point in time, which is 0. Our foreach loop now does whatever with 0.
Now we ask someEnumerable for its second value, on the next iteration of the loop. It asks Select for a value, Select asks Where,Where asks strings, strings yields "1", Where calls the predicate, setting outValue to 1 as a side effect, Select yields the current value of outValue, which is 1. The foreach loop now does whatever with 1.
So the key point here is that due to the way in which Where and Select defer execution, performing their work only immediately when the values are needed, the side effect of the Where predicate ends up being called immediately before each projection in the Select. If you didn't defer execution, and instead performed all of the TryParse calls before any of the projections in Select, then you would see 100 for each value. We can actually simulate this easily enough. We can materialize the results of the Where into a collection, and then see the results of the Select be 100 repeated over and over:
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.ToList()//eagerly evaluate the query up to this point
.Select(s => outValue);
Having said all of that, the query that you have is not particularly good design. Whenever possible you should avoid queries that have side effects (such as your Where). The fact that the query both causes side effects, and observes the side effects that it creates, makes following all of this rather hard. The preferable design would be to rely on purely functional methods that aren't causing side effects. In this context the simplest way to do that is to create a method that tries to parse a string and returns an int?:
public static int? TryParse(string rawValue)
{
int output;
if (int.TryParse(rawValue, out output))
return output;
else
return null;
}
This allows us to write:
var someEnumerable = from s in strings
let n = TryParse(s)
where n != null
select n.Value;
Here there are no observable side effects in the query, nor is the query observing any external side effects. It makes the whole query far easier to reason about.
Because when you enumerate the value changes one at a time and changes the value of the variable on the fly. Due to the nature of LINQ the select for the first iteration is executed before the where for the second iteration. Basically this variable turns into a foreach loop variable of a kind.
This is what deferred execution buys us. Previous methods do not have to execute fully before the next method in the chain starts. One value moves through all the methods before the second goes in. This is very useful with methods like First or Take which stop the iteration early. Exceptions to the rule are methods that need to aggregate or sort like OrderBy (they need to look at all elements before finding out which is first). If you add an OrderBy before the Select the behavior will probably break.
Of course I wouldn't depend on this behavior in production code.
I don't understand what is odd for you?
If you write a loop on this enumerable like this
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Because LINQ enumerates each where and select lazyly and yield each value, if you add ToArray
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
Than in the loop you will see 99 s
Edit
The below code will print 99 s
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
//outValue = 3;
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Recently, I ran into a strange problem where I had a method generate an IEnumerable collection of objects. This method contained four yield return statements that returned four objects. I assigned the result to a variable results using the var keyword.
var result = GenerateCollection().ToList();
This effectively meant: List<MyType> result = GenerateCollection().
I made a simple for loop over the elements of this collection. What surprised me is that the collection was reenumerated for each call to the list (for each result[i]). Later I used the result collection in a LINQ query which had some bad results performance-wise due to the continual reenumeration of the collection.
I solved the problem by casting to array instead of the list.
What this makes me wonder now is when are the collections enumerated? Which method calls make collections reenumerate?
EDIT: The GenerateCollection() method looked similarly to this:
public static IEnumerable<MyType> GenerateCollection()
{
var array = data.AsParallel(); //data is a simple collection of sublists of strings
yield return new MyType("a", array.Where(x => x.Sublist.Count(y => y == 'a') == 0));
yield return new MyType("b", array.Where(x => x.Sublist.Count(y => y == 'b') == 0));
yield return new MyType("c", array.Where(x => x.Sublist.Count(y => y == 'c') == 0));
yield return new MyType("d", array.Where(x => x.Sublist.Count(y => y == 'd') == 0));
}
You are yielding objects which have queries inside - it's not some sequence of array values - its iterator objects which are not executed when you are passing them to constructor of MyType. When you create list of MyType objects
var result = GenerateCollection().ToList();
all MyType instances are yielded and saved into list, but if you haven't executed iterators in MyType constructor, then queries are not executed. And even more - they will be executed each time again, if you'll call some operator which executes query, e.g.
result[i].ArrayIterator.Count(); // first execution
foreach(var item in result[i].ArrayIterator) // second execution
// ...
You can fix it if you'll pass result of query execution to MyType constructor:
yield return new MyType("a", array.Where(x => !x.Sublist.Contains('a')).ToList())
Now you are passing list of items instead of iterator (you can use ToArray()) also. Query is executed when you are yielding MyType instance, and it will not be executed again.
array.Where(x => x.Sublist.Count(y => y == 'a') == 0)
This piece of code will be enumerated every time you access it in MyType. Use ToList or ToArray to ensure it is enumerated only once in place where the code is written.
collections which are based on deferred execution gets enumerated as soon as you use them For-example IEnumerable,IQueryable etc. and collections which are based on immediate execution gets enumerated as soon as they are created for example LIST.
Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].
.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here
You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.
You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)
First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.
While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}
You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}