Linq FirstOrDefault evaluates predicate each iteration? - c#

If I had a statement such as:
var item = Core.Collections.Items.FirstOrDefault(itm => itm.UserID == bytereader.readInt());
Does this code read an integer from my stream each iteration, or does it read the integer once, store it, then use its value throughout the lookup?

Consider this code:
static void Main(string[] args)
{
new[] { 1, 2, 3, 4 }.FirstOrDefault(j => j == Get());
Console.ReadLine();
}
static int i = 5;
static int Get()
{
Console.WriteLine("GET:" + i);
return i--;
}
It shows, that it will call the method the number of times it needs to meet the first element matching the condition. The output will be:
GET:5
GET:4
GET:3

I don't know without checking but would expect it to read it each time.
But this is very easily remedied with the following version of your code.
byte val = bytereader.readInt();
var item = Core.Collections.Items.FirstOrDefault(itm => itm.UserID == val);
Myself, I would automatically take this approach anyway just to remove any doubt. Might be a good habit to form as there is no reason to read it for each item.

It's actually quite obvious that the call is performed for each item - FirstOrDefault() takes an delegate as argument. This fact is a bit obscured by using a lambda method but in the end the method only sees a delegate that it can call for each item to check the predicate. In order to evaluate the right hand side only once some magic mechanism would have to understand and rewrite the method and (sometimes sadly) there is no real magic inside compilers and runtimes.

Related

spread syntax in c#? [duplicate]

I know this is a basic question, but I couldn't find an answer.
Why use it? if you write a function or a method that's using it, when you remove it the code will still work perfectly, 100% as without it. E.g:
With params:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
sum += item + 2;
return sum;
}
Without params:
static public int addTwoEach(int[] args)
{
int sum = 0;
foreach (var item in args)
sum += item + 2;
return sum;
}
With params you can call your method like this:
addTwoEach(1, 2, 3, 4, 5);
Without params, you can’t.
Additionally, you can call the method with an array as a parameter in both cases:
addTwoEach(new int[] { 1, 2, 3, 4, 5 });
That is, params allows you to use a shortcut when calling the method.
Unrelated, you can drastically shorten your method:
public static int addTwoEach(params int[] args)
{
return args.Sum() + 2 * args.Length;
}
Using params allows you to call the function with no arguments. Without params:
static public int addTwoEach(int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
addtwoEach(); // throws an error
Compare with params:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
addtwoEach(); // returns 0
Generally, you can use params when the number of arguments can vary from 0 to infinity, and use an array when numbers of arguments vary from 1 to infinity.
It allows you to add as many base type parameters in your call as you like.
addTwoEach(10, 2, 4, 6)
whereas with the second form you have to use an array as parameter
addTwoEach(new int[] {10,2,4,6})
One danger with params Keyword is, if after Calls to the Method have been coded,
someone accidentally / intentionally removes one/more required Parameters from the Method Signature,
and
one/more required Parameters immediately prior to the params Parameter prior to the Signature change were Type-Compatible with the params Parameter,
those Calls will continue to compile with one/more Expressions previously intended for required Parameters being treated as the optional params Parameter. I just ran into the worst possible case of this: the params Parameter was of Type object[].
This is noteworthy because developers are used to the compiler slapping their wrists with the much, much more common scenario where Parameters are removed from a Method with all required Parameters (because the # of Parameters expected would change).
For me, it's not worth the shortcut. (Type)[] without params will work with 0 to infinity # of Parameters without needing Overrides. Worst case is you'll have to add a , new (Type) [] {} to Calls where it doesn't apply.
Btw, imho, the safest (and most readable practice) is to:
pass via Named Parameters (which we can now do even in C# ~2 decades after we could in VB ;P), because:
1.1. it's the only way that guarantees prevention of unintended values passed to Parameters after Parameter order, Compatible-Type and/or count change after Calls have been coded,
1.2. it reduces those chances after a Parameter meaning change, because the likely new identifier name reflecting the new meaning is right next to the value being passed to it,
1.3. it avoids having to count commas and jump back & forth from Call to Signature to see what Expression is being passed for what Parameter, and
1.3.1. By the way, this reason alone should be plenty (in terms of avoiding frequent error-prone violations of the DRY Principle just to read the code not to mention also modify it), but this reason can be exponentially more important if there are one/more Expressions being Passed that themselves contain commas, i.e. Multi-Dimensional Array Refs or Multi-Parameter Function Calls. In that case, you couldn't even use (which even if you could, would still be adding an extra step per Parameter per Method Call) a Find All Occurrences in a Selection feature in your editor to automate the comma-counting for you.
1.4. if you must use Optional Parameters (params or not), it allows you to search for Calls where a particular Optional Parameter is Passed (and therefore, most likely is not or at least has the possibility of being not the Default Value),
(NOTE: Reasons 1.2. and 1.3. can ease and reduce chances of error even on coding the initial Calls not to mention when Calls have to be read and/or changed.))
and
do so ONE - PARAMETER - PER - LINE for better readability (because:
2.1. it's less cluttered, and
2.2. it avoids having to scroll right & back left (and having to do so PER - LINE, since most mortals can't read the left part of multiple lines, scroll right and read the right part)).
2.3. it's consistent with the "Best Practice" we've already evolved into for Assignment Statements, because every Parameter Passed is in essence an Assignment Statement (assigning a Value or Reference to a Local Variable). Just like those who follow the latest "Best Practice" in Coding Style wouldn't dream of coding multiple Assignment Statements per line, we probably shouldn't (and won't once "Best Practice" catches up to my "genius" ;P ) do so when Passing Parameters.
NOTES:
Passing in Variables whose names mirror the Parameters' doesn't help when:
1.1. you're passing in Literal Constants (i.e. a simple 0/1, false/true or null that even "'Best Practices'" may not require you use a Named Constant for and their purpose can't be easily inferred from the Method name),
1.2. the Method is significantly lower-level / more generic than the Caller such that you would not want / be able to name your Variables the same/similar to the Parameters (or vice versa), or
1.3. you're re-ordering / replacing Parameters in the Signature that may result in prior Calls still Compiling because the Types happen to still be compatible.
Having an auto-wrap feature like VS does only eliminates ONE (#2.2) of the 8 reasons I gave above. Prior to as late as VS 2015, it did NOT auto-indent (!?! Really, MS?!?) which increases severity of reason #2.2.
VS should have an option that generates Method Call snippets with Named Parameters (one per line of course ;P) and a compiler option that requires Named Parameters (similar in concept to Option Explicit in VB which, btw, the requirement of was prolly once thought equally as outrageous but now is prolly required by "'Best Practices'"). In fact, "back in my day" ;), in 1991 just months into my career, even before I was using (or had even seen) a language with Named Parameters, I had the anti-sheeple / "just cuz you can, don't mean you should" / don't blindly "cut the ends of the roast" sense enough to simulate it (using in-line comments) without having seen anyone do so. Not having to use Named Parameters (as well other syntax that save "'precious'" source code keystrokes) is a relic of the Punch Card era when most of these syntaxes started. There's no excuse for that with modern hardware and IDE's and much more complex software where readability is much, Much, MUCH more important. "Code is read much more often than is written". As long as you're not duplicating non-auto-updated code, every keystroke saved is likely to cost exponentially more when someone (even yourself) is trying to read it later.
No need to create overload methods, just use one single method with params as shown below
// Call params method with one to four integer constant parameters.
//
int sum0 = addTwoEach();
int sum1 = addTwoEach(1);
int sum2 = addTwoEach(1, 2);
int sum3 = addTwoEach(3, 3, 3);
int sum4 = addTwoEach(2, 2, 2, 2);
params also allows you to call the method with a single argument.
private static int Foo(params int[] args) {
int retVal = 0;
Array.ForEach(args, (i) => retVal += i);
return retVal;
}
i.e. Foo(1); instead of Foo(new int[] { 1 });. Can be useful for shorthand in scenarios where you might need to pass in a single value rather than an entire array. It still is handled the same way in the method, but gives some candy for calling this way.
Adding params keyword itself shows that you can pass multiple number of parameters while calling that method which is not possible without using it. To be more specific:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
When you will call above method you can call it by any of the following ways:
addTwoEach()
addTwoEach(1)
addTwoEach(new int[]{ 1, 2, 3, 4 })
But when you will remove params keyword only third way of the above given ways will work fine. For 1st and 2nd case you will get an error.
One more important thing needs to be highlighted. It's better to use params because it is better for performance. When you call a method with params argument and passed to it nothing:
public void ExampleMethod(params string[] args)
{
// do some stuff
}
call:
ExampleMethod();
Then a new versions of the .Net Framework do this (from .Net Framework 4.6):
ExampleMethod(Array.Empty<string>());
This Array.Empty object can be reused by framework later, so there are no needs to do redundant allocations. These allocations will occur when you call this method like this:
ExampleMethod(new string[] {});
Might sound stupid,
But Params doesn't allow multidimensional array.
Whereas you can pass a multidimensional array to a function.
Another example
public IEnumerable<string> Tokenize(params string[] words)
{
...
}
var items = Tokenize(product.Name, product.FullName, product.Xyz)
It enhances code brevity. Why be lengthy when you can be concise?
using System;
namespace testingParams
{
class Program
{
private void lengthy(int[] myArr)
{
foreach (var item in myArr)
{
//...
}
}
private void concise(params int[] myArr) {
foreach (var item in myArr)
{
//...
}
}
static void Main(string[] args)
{
Program p = new Program();
//Why be lengthy...:
int[] myArr = new int[] { 1, 2, 3, 4, 5 };
p.lengthy(myArr);
//When you can be concise...:
p.concise(1, 2, 3, 4, 5);
}
}
}
If you remove the keyword params, the caller code will not work as desired.

C#: What's the difference between the params keyword and giving a default value [duplicate]

I know this is a basic question, but I couldn't find an answer.
Why use it? if you write a function or a method that's using it, when you remove it the code will still work perfectly, 100% as without it. E.g:
With params:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
sum += item + 2;
return sum;
}
Without params:
static public int addTwoEach(int[] args)
{
int sum = 0;
foreach (var item in args)
sum += item + 2;
return sum;
}
With params you can call your method like this:
addTwoEach(1, 2, 3, 4, 5);
Without params, you can’t.
Additionally, you can call the method with an array as a parameter in both cases:
addTwoEach(new int[] { 1, 2, 3, 4, 5 });
That is, params allows you to use a shortcut when calling the method.
Unrelated, you can drastically shorten your method:
public static int addTwoEach(params int[] args)
{
return args.Sum() + 2 * args.Length;
}
Using params allows you to call the function with no arguments. Without params:
static public int addTwoEach(int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
addtwoEach(); // throws an error
Compare with params:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
addtwoEach(); // returns 0
Generally, you can use params when the number of arguments can vary from 0 to infinity, and use an array when numbers of arguments vary from 1 to infinity.
It allows you to add as many base type parameters in your call as you like.
addTwoEach(10, 2, 4, 6)
whereas with the second form you have to use an array as parameter
addTwoEach(new int[] {10,2,4,6})
One danger with params Keyword is, if after Calls to the Method have been coded,
someone accidentally / intentionally removes one/more required Parameters from the Method Signature,
and
one/more required Parameters immediately prior to the params Parameter prior to the Signature change were Type-Compatible with the params Parameter,
those Calls will continue to compile with one/more Expressions previously intended for required Parameters being treated as the optional params Parameter. I just ran into the worst possible case of this: the params Parameter was of Type object[].
This is noteworthy because developers are used to the compiler slapping their wrists with the much, much more common scenario where Parameters are removed from a Method with all required Parameters (because the # of Parameters expected would change).
For me, it's not worth the shortcut. (Type)[] without params will work with 0 to infinity # of Parameters without needing Overrides. Worst case is you'll have to add a , new (Type) [] {} to Calls where it doesn't apply.
Btw, imho, the safest (and most readable practice) is to:
pass via Named Parameters (which we can now do even in C# ~2 decades after we could in VB ;P), because:
1.1. it's the only way that guarantees prevention of unintended values passed to Parameters after Parameter order, Compatible-Type and/or count change after Calls have been coded,
1.2. it reduces those chances after a Parameter meaning change, because the likely new identifier name reflecting the new meaning is right next to the value being passed to it,
1.3. it avoids having to count commas and jump back & forth from Call to Signature to see what Expression is being passed for what Parameter, and
1.3.1. By the way, this reason alone should be plenty (in terms of avoiding frequent error-prone violations of the DRY Principle just to read the code not to mention also modify it), but this reason can be exponentially more important if there are one/more Expressions being Passed that themselves contain commas, i.e. Multi-Dimensional Array Refs or Multi-Parameter Function Calls. In that case, you couldn't even use (which even if you could, would still be adding an extra step per Parameter per Method Call) a Find All Occurrences in a Selection feature in your editor to automate the comma-counting for you.
1.4. if you must use Optional Parameters (params or not), it allows you to search for Calls where a particular Optional Parameter is Passed (and therefore, most likely is not or at least has the possibility of being not the Default Value),
(NOTE: Reasons 1.2. and 1.3. can ease and reduce chances of error even on coding the initial Calls not to mention when Calls have to be read and/or changed.))
and
do so ONE - PARAMETER - PER - LINE for better readability (because:
2.1. it's less cluttered, and
2.2. it avoids having to scroll right & back left (and having to do so PER - LINE, since most mortals can't read the left part of multiple lines, scroll right and read the right part)).
2.3. it's consistent with the "Best Practice" we've already evolved into for Assignment Statements, because every Parameter Passed is in essence an Assignment Statement (assigning a Value or Reference to a Local Variable). Just like those who follow the latest "Best Practice" in Coding Style wouldn't dream of coding multiple Assignment Statements per line, we probably shouldn't (and won't once "Best Practice" catches up to my "genius" ;P ) do so when Passing Parameters.
NOTES:
Passing in Variables whose names mirror the Parameters' doesn't help when:
1.1. you're passing in Literal Constants (i.e. a simple 0/1, false/true or null that even "'Best Practices'" may not require you use a Named Constant for and their purpose can't be easily inferred from the Method name),
1.2. the Method is significantly lower-level / more generic than the Caller such that you would not want / be able to name your Variables the same/similar to the Parameters (or vice versa), or
1.3. you're re-ordering / replacing Parameters in the Signature that may result in prior Calls still Compiling because the Types happen to still be compatible.
Having an auto-wrap feature like VS does only eliminates ONE (#2.2) of the 8 reasons I gave above. Prior to as late as VS 2015, it did NOT auto-indent (!?! Really, MS?!?) which increases severity of reason #2.2.
VS should have an option that generates Method Call snippets with Named Parameters (one per line of course ;P) and a compiler option that requires Named Parameters (similar in concept to Option Explicit in VB which, btw, the requirement of was prolly once thought equally as outrageous but now is prolly required by "'Best Practices'"). In fact, "back in my day" ;), in 1991 just months into my career, even before I was using (or had even seen) a language with Named Parameters, I had the anti-sheeple / "just cuz you can, don't mean you should" / don't blindly "cut the ends of the roast" sense enough to simulate it (using in-line comments) without having seen anyone do so. Not having to use Named Parameters (as well other syntax that save "'precious'" source code keystrokes) is a relic of the Punch Card era when most of these syntaxes started. There's no excuse for that with modern hardware and IDE's and much more complex software where readability is much, Much, MUCH more important. "Code is read much more often than is written". As long as you're not duplicating non-auto-updated code, every keystroke saved is likely to cost exponentially more when someone (even yourself) is trying to read it later.
No need to create overload methods, just use one single method with params as shown below
// Call params method with one to four integer constant parameters.
//
int sum0 = addTwoEach();
int sum1 = addTwoEach(1);
int sum2 = addTwoEach(1, 2);
int sum3 = addTwoEach(3, 3, 3);
int sum4 = addTwoEach(2, 2, 2, 2);
params also allows you to call the method with a single argument.
private static int Foo(params int[] args) {
int retVal = 0;
Array.ForEach(args, (i) => retVal += i);
return retVal;
}
i.e. Foo(1); instead of Foo(new int[] { 1 });. Can be useful for shorthand in scenarios where you might need to pass in a single value rather than an entire array. It still is handled the same way in the method, but gives some candy for calling this way.
Adding params keyword itself shows that you can pass multiple number of parameters while calling that method which is not possible without using it. To be more specific:
static public int addTwoEach(params int[] args)
{
int sum = 0;
foreach (var item in args)
{
sum += item + 2;
}
return sum;
}
When you will call above method you can call it by any of the following ways:
addTwoEach()
addTwoEach(1)
addTwoEach(new int[]{ 1, 2, 3, 4 })
But when you will remove params keyword only third way of the above given ways will work fine. For 1st and 2nd case you will get an error.
One more important thing needs to be highlighted. It's better to use params because it is better for performance. When you call a method with params argument and passed to it nothing:
public void ExampleMethod(params string[] args)
{
// do some stuff
}
call:
ExampleMethod();
Then a new versions of the .Net Framework do this (from .Net Framework 4.6):
ExampleMethod(Array.Empty<string>());
This Array.Empty object can be reused by framework later, so there are no needs to do redundant allocations. These allocations will occur when you call this method like this:
ExampleMethod(new string[] {});
Might sound stupid,
But Params doesn't allow multidimensional array.
Whereas you can pass a multidimensional array to a function.
Another example
public IEnumerable<string> Tokenize(params string[] words)
{
...
}
var items = Tokenize(product.Name, product.FullName, product.Xyz)
It enhances code brevity. Why be lengthy when you can be concise?
using System;
namespace testingParams
{
class Program
{
private void lengthy(int[] myArr)
{
foreach (var item in myArr)
{
//...
}
}
private void concise(params int[] myArr) {
foreach (var item in myArr)
{
//...
}
}
static void Main(string[] args)
{
Program p = new Program();
//Why be lengthy...:
int[] myArr = new int[] { 1, 2, 3, 4, 5 };
p.lengthy(myArr);
//When you can be concise...:
p.concise(1, 2, 3, 4, 5);
}
}
}
If you remove the keyword params, the caller code will not work as desired.

Enumerating over lambdas does not bind the scope correctly?

consider the following C# program:
using System;
using System.Linq;
using System.Collections.Generic;
public class Test
{
static IEnumerable<Action> Get()
{
for (int i = 0; i < 2; i++)
{
int capture = i;
yield return () => Console.WriteLine(capture.ToString());
}
}
public static void Main(string[] args)
{
foreach (var a in Get()) a();
foreach (var a in Get().ToList()) a();
}
}
When executed under Mono compiler (e.g. Mono 2.10.2.0 - paste into here), it writes the following output:
0
1
1
1
This seems totally unlogical to me. When directly iterating the yield function, the scope of the for-loop is "correctly" (to my understanding) used. But when I store the result in a list first, the scope is always the last action?!
Can I assume that this is a bug in the Mono compiler, or did I hit a mysterious corner case of C#'s lambda and yield-stuff?
BTW: When using Visual Studio compiler (and either MS.NET or mono to execute), the result is the expected 0 1 0 1
I'll give you the reason why it was 0 1 1 1:
foreach (var a in Get()) a();
Here you go into Get and it starts iterating:
i = 0 => return Console.WriteLine(i);
The yield returns with the function and executes the function, printing 0 to the screen, then returns to the Get() method and continues.
i = 1 => return Console.WriteLine(i);
The yield returns with the function and executes the function, printing 1 to the screen, then returns to the Get() method and continues (only to find that it has to stop).
But now, you're not iterating over each item when it happens, you're building a list and then iterating over that list.
foreach (var a in Get().ToList()) a();
What you are doing isn't like above, Get().ToList() returns a List or Array (not sure wich one). So now this happens:
i = 0 => return Console.WriteLine(i);
And in you Main() function, you get the following in memory:
var i = 0;
var list = new List
{
Console.WriteLine(i)
}
You go back into the Get() function:
i = 1 => return Console.WriteLine(i);
Which returns to your Main()
var i = 1;
var list = new List
{
Console.WriteLine(i),
Console.WriteLine(i)
}
And then does
foreach (var a in list) a();
Which will print out 1 1
It seems like it was ignoring that you made sure you encapsulated the value before returning the function.
#Armaron - The .ToList() extension returns List of type T as ToArray() returns T[] as the naming convention implies, but I think you are on the right track with your response.
This sounds like an issuse with the compiler. I agree with Servy that it is probably a bug, however, have you tried the following?
public class Test
{
private static int capture = 0;
static IEnumerable<Action> Get()
{
for (int i = 0; i < 2; i++)
{
capture++;
yield return () => Console.WriteLine(capture.ToString());
}
}
}
Additionally you may want to try the static approach, perhaps this will perform a more accurate conversion as your function is static.
List<T> list = Enumerable.ToList(Get());
When calling ToList() it seems as though it is not performing a single iteration for each value but rather:
return new List<T>(Get());
The second for each in your code does not make sense to me in implementation as to why it would ever be necessary or beneficial unless you require additional actions to be added/removed to the List object. The first makes perfect sense since all you are doing is iterating through the object and performing the associated action. My understanding is that an integer within the scope of the static IEnumerbale object is being calculated during conversion by performing the entire iteration and the action is preserving the int as a static int due to scope. Also, keep in mind that IEnumerable is merely an interface that is implemented by List which implements IList, and may contain logic for the conversion built in.
That being said I am interested to see/hear your findings as this is an interesting post. I will definitely upvote the question. Please ask questions if anything I said needs clarification or if something is false say so, although I am confident in my usage of the yield keyword of IEnumerable but this is a unique issue.

Is List.Find<T> considered dangerous? What's a better way to do List<T>.Find(Predicate<T>)?

I used to think that List<T> is considered dangerous. My point is that, I think default(T) is not a safe return value! Many other people think so too Consider the following:
List<int> evens = new List<int> { 0, 2, 4, 6, , 8};
var evenGreaterThan10 = evens.Find(c=> c > 10);
// evenGreaterThan10 = 0 #WTF
default(T) for value types is 0, hence 0 is goona be returned is the above code segment!
I didn't like this, so I added an extension method called TryFind that returns a boolean and accepts an output parameter besides the Predicate, something similar to the famous TryParse approach.
Edit:
Here's my TryFind extension method:
public static bool TryFind<T>(this List<T> list, Predicate<T> predicate, out T output)
{
int index = list.FindIndex(predicate);
if (index != -1)
{
output = list[index];
return true;
}
output = default(T);
return false;
}
What's a your way to do Find on generic Lists?
I don't. I do .Where()
evens.Where(n => n > 10); // returns empty collection
evens.Where(n => n > 10).First(); // throws exception
evens.Where(n => n > 10).FirstOrDefault(); // returns 0
The first case just returns a collection, so I can simply check if the count is greater than 0 to know if there are any matches.
When using the second, I wrap in a try/catch block that handles InvalidOperationException specfically to handle the case of an empty collection, and just rethrow (bubble) all other exceptions. This is the one I use the least, simply because I don't like writing try/catch statements if I can avoid it.
In the third case I'm OK with the code returning the default value (0) when no match exists - after all, I did explicitly say "get me the first or default" value. Thus, I can read the code and understand why it happens if I ever have a problem with it.
Update:
For .NET 2.0 users, I would not recommend the hack that has been suggested in comments. It violates the license agreement for .NET, and it will in no way be supported by anyone.
Instead, I see two ways to go:
Upgrade. Most of the stuff that runs on 2.0 will run (more or less) unchanged on 3.5. And there is so much in 3.5 (not just LINQ) that is really worth the effort of upgrading to have it available. Since there is a new CLR runtime version for 4.0, there are more breaking changes between 2.0 and 4.0 than between 2.0 and 3.5, but if possible I'd recommend upgrading all the way to 4.0. There's really no good reason to be sitting around writing new code in a version of a framework that has had 3 major releases (yes, I count both 3.0, 3.5 and 4.0 as major...) since the one you're using.
Find a work-around to the Find problem. I'd recommend either just using FindIndex as you do, since the -1 that is returned when nothing is found is never ambiguous, or implementing something with FindIndex as you did. I don't like the out syntax, but before I write an implementation that doesn't use it, I need some input on what you want returned when nothing was found.
Until then, the TryFind can be considered OK, since it aligns with previous functionality in .NET, for example Integer.TryParse. And you do get a decent way to handle nothing found doing
List<Something> stuff = GetListOfStuff();
Something thing;
if (stuff.TryFind(t => t.IsCool, thing)) {
// do stuff that's good. thing is the stuff you're looking for.
}
else
{
// let the user know that the world sucks.
}
Instead of Find you could use FindIndex. It returns the index of the element found or -1 if no element was found.
http://msdn.microsoft.com/en-us/library/x1xzf2ca%28v=VS.80%29.aspx
It is ok if you know there are no default(T) values in your list or if the default(T) return value can't be the result.
You could implement your own easily.
public static T Find<T>(this List<T> list, Predicate<T> match, out bool found)
{
found = false;
for (int i = 0; i < list.Count; i++)
{
if (match(list[i]))
{
found = true;
return list[i];
}
}
return default(T);
}
and in code:
bool found;
a.Find(x => x > 5, out found);
Other options:
evens.First(predicate);//throws exception
evens.FindAll(predicate);//returns a list of results => use .Count.
It depends on what version of framework you can use.
If there is a problem with the default value of T, use Nullable to get a more meaningful default value :
List<int?> evens = new List<int?> { 0, 2, 4, 6, 8 };
var greaterThan10 = evens.Find(c => c > 10);
if (greaterThan10 != null)
{
// ...
}
This also requires no additional call of Exists() first.
Doing a call to Exists() first will help with the problem:
int? zz = null;
if (evens.Exists(c => c > 10))
zz = evens.Find(c => c > 10);
if (zz.HasValue)
{
// .... etc ....
}
slightly more long-winded, but does the job.
The same answer I gave on Reddit.
Enumerable.FirstOrDefault( predicate )
Inspired by Jaroslav Jandek above I would make an extension that returns bool true if match and passes the object in out if match:
static class Extension
{
public static bool TryFind<T>(this List<T> list, Predicate<T> match, out T founditem)
{
for (int i = 0; i < list.Count; i++)
{
if (match(list[i]))
{
founditem = list[i];
return true;
}
}
founditem = default(T);
return false;
}
}
Then it can used on a List object 'a' and called like this using 'out var' syntax:
if (a.TryFind(x => x > 5, out var founditem)){
//enter code here to use 'founditem'
};
Just for fun
evens.Where(c => c > 10)
.Select(c => (int?)c)
.DefaultIfEmpty(null)
.First();

FindAll vs Where extension-method

I just want know if a "FindAll" will be faster than a "Where" extentionMethod and why?
Example :
myList.FindAll(item=> item.category == 5);
or
myList.Where(item=> item.category == 5);
Which is better ?
Well, FindAll copies the matching elements to a new list, whereas Where just returns a lazily evaluated sequence - no copying is required.
I'd therefore expect Where to be slightly faster than FindAll even when the resulting sequence is fully evaluated - and of course the lazy evaluation strategy of Where means that if you only look at (say) the first match, it won't need to check the remainder of the list. (As Matthew points out, there's work in maintaining the state machine for Where. However, this will only have a fixed memory cost - whereas constructing a new list may require multiple array allocations etc.)
Basically, FindAll(predicate) is closer to Where(predicate).ToList() than to just Where(predicate).
Just to react a bit more to Matthew's answer, I don't think he's tested it quite thoroughly enough. His predicate happens to pick half the items. Here's a short but complete program which tests the same list but with three different predicates - one picks no items, one picks all the items, and one picks half of them. In each case I run the test fifty times to get longer timing.
I'm using Count() to make sure that the Where result is fully evaluated. The results show that collecting around half the results, the two are neck and neck. Collecting no results, FindAll wins. Collecting all the results, Where wins. I find this intriguing: all of the solutions become slower as more and more matches are found: FindAll has more copying to do, and Where has to return the matched values instead of just looping within the MoveNext() implementation. However, FindAll gets slower faster than Where does, so loses its early lead. Very interesting.
Results:
FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891
(Compiled with /o+ /debug- and run from the command line, .NET 3.5.)
Code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Test
{
static List<int> ints = Enumerable.Range(0, 10000000).ToList();
static void Main(string[] args)
{
Benchmark("All", i => i >= 0); // Match all
Benchmark("Half", i => i % 2 == 0); // Match half
Benchmark("None", i => i < 0); // Match none
}
static void Benchmark(string name, Predicate<int> predicate)
{
// We could just use new Func<int, bool>(predicate) but that
// would create one delegate wrapping another.
Func<int, bool> func = (Func<int, bool>)
Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
predicate.Method);
Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
Benchmark("Where: " + name, () => ints.Where(func).Count());
}
static void Benchmark(string name, Action action)
{
GC.Collect();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 50; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}
}
How about we test instead of guess? Shame to see the wrong answer get out.
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284
Release
sw1: 532194
sw2: 1016524
*/
Edit:
Even if I turn the above code from
var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();
... to ...
var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();
I get these results
/*
Debug
sw1: 1250409
sw2: 1267016
Release
sw1: 539536
sw2: 600361
*/
Edit 2.0...
If you want a list of the subset of the current list the fastest method if the FindAll(). The reason for this is simple. The FindAll instance method uses the indexer on the current List instead of the enumerator state machine. The Where() extension method is an external call to a different class that uses the enumerator. If you step from each node in the list to the next node you will have to call the MoveNext() method under the covers. As you can see from the above examples it is even faster to use the index entries to create a new list (that is pointing to the original items, so memory bloat will be minimal) to even just get a count of the filtered items.
Now if you are going to early abort from the Enumerator the Where() method could be faster. Of course if you move the early abort logic to the predicate of the FindAll() method you will again be using the indexer instead of the enumerator.
Now there are other reasons to use the Where() statement (such as the other linq methods, foreach blocks and many more) but the question was is the FindAll() faster than Where(). And unless you don't execute the Where() the answer seems to be yes. (When comparing apples to apples)
I am not say don't use LINQ or the .Where() method. They make for code that is much simpler to read. The question was about performance and not about how easy you can read and understand the code. By fast the fastest way to do this work would be to use a for block stepping each index and doing any logic as you want (even early exits). The reason LINQ is so great is becasue of the complex expression trees and transformation you can get with them. But using the iterator from the .Where() method has to go though tons of code to find it's way to a in memory statemachine that is just getting the next index out of the List. It should also be noted that this .FindAll() method is only useful on objects that implmented it (such as Array and List.)
Yet more...
for (int x = 0; x < 20; x++)
{
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0).Count;
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
sw2.Stop();
var sw4 = Stopwatch.StartNew();
var cntForeach = 0;
foreach (var item in ints)
if (item % 2 == 0)
cntForeach++;
sw4.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}
/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*
Well, at least you can try to measure it.
The static Where method is implemented using an iterator bloc (yield keyword), which basically means that the execution will be deferred. If you only compare the calls to theses two methods, the first one will be slower, since it immediately implies that the whole collection will be iterated.
But if you include the complete iteration of the results you get, things can be a bit different. I'm pretty sure the yield solution is slower, due to the generated state machine mechanism it implies. (see #Matthew anwser)
I can give some clue, but not sure which one faster.
FindAll() is executed right away.
Where() is defferred executed.
The advantage of where is the deferred execution. See the difference if you'd have the following functionality
BigSequence.FindAll( x => DoIt(x) ).First();
BigSequence.Where( x => DoIt(x) ).First();
FindAll has covered the complete sequene, while Where in most sequences will stop enumerating as soon as one element is found.
The same effects will be one using Any(), Take(), Skip(), etc. I'm not sure, but I guess you'll have huge advantages in all functions that have deferred execution

Categories