In which situations I should use LINQ to Objects?
Obviously I can do everything without LINQ. So in which operations LINQ actually helps me to code shorter and/or more readable?
This question triggered by this
I find LINQ to Objects useful all over the place. The problem it solves is pretty general:
You have a collection of some data items
You want another collection, formed from the original collection, but after some sort of transformation or filtering. This might be sorting, projection, applying a predicate, grouping, etc.
That's a situation I come across pretty often. There are an awful lot of areas of programming which basically involve transforming one collection (or stream of data) into another. In those cases the code using LINQ is almost always shorter and more readable. I'd like to point out that LINQ shouldn't be regarded as being synonymous with query expressions - if only a single operator is required, the normal "dot notation" (using extension methods) can often be shorter and more readable.
One of the reasons I particularly like LINQ to Objects is that it is so general - whereas LINQ to SQL is likely to only get involved in your data layer (or pretty much become the data layer), LINQ to Objects is applicable in every layer, and in all kinds of applications.
Just as an example, here's a line in my MiniBench benchmarking framework, converting a TestSuite (which is basically a named collection of tests) into a ResultSuite (a named collection of results):
return new ResultSuite(name,
tests.Select(test => test.Run(input, expectedOutput)));
Then again if a ResultSuite needs to be scaled against some particular "standard" result:
return new ResultSuite(name,
results.Select(x => x.ScaleToStandard(standard, mode)));
It wouldn't be hard to write this code without LINQ, but LINQ just makes it clearer and lets you concentrate on the real "logic" instead of the details of iterating through loops and adding results to lists etc.
Even when LINQ itself isn't applicable, some of the features which were largely included for the sake of LINQ (e.g. implicitly typed local variables, lambda expressions, extension methods) can be very useful.
The answer practically everywhere comes to mind. A better question would be when not to use it.
LINQ is great for the "slippery slope". Think of what's involved in many common operations:
Where. Just write a foreach loop and an "if"
Select. Create an empty list of the target type, loop through the originals, convert each one and add it to the results.
OrderBy. Just add it to a list and call .Sort(). Or implement a bubble sort ;)
ThenBy (from order by PropertyA, then by PropertyB). Quite a bit harder. A custom comparer and Sort should do the trick.
GroupBy - create a Dictionary<key, List<value>> and loop through all items. If no key exists create it, then add items to the appropriate list.
In each of those cases, the procedural way takes more code than the LINQ way. In the case of "if" it's a couple of lines more; in the case of GroupBy or OrderBy/ThenBy it's a lot more.
Now take an all too common scenario of combining them together. You're suddenly looking at a 10-20 line method which could be solved with 3-4 lines in LINQ. And the LINQ version is guaranteed to be easier to read (once you are familiar with LINQ).
So when do you use LINQ? My answer: whenever you see "foreach" :)
LINQ is pretty useful in a few scenarios:
You want to use typed "business entities", instead of data tables, to more naturally access your data (and aren't already using something like NHibernate or LLBLGenPro)
You want to query non-relational data using a SQL like syntax (this is real handy when querying lists and such)
You don't like lots of inline SQL or stored procedures
LINQ comes in to play when you start doing complex filtering on complex data types. For example, if you're given a list of People objects and you need to gather a list of all the doctors within that list. With LINQ, you can compress the following code into a single LINQ statement:
(pseudo-code)
doctors = []
for person in people:
if person is doctor:
doctors.append(person)
(sorry, my C# is rusty, type checking syntax is probably incorrect, but you get the idea)
doctors = from person in people where person.type() == doctor select person;
Edit: After I answered I see a change to say "LINQ to Objects". Oh well.
If by LINQ we refer to all the new types in System.Linq, as well as new compiler features, then it'll have quite a bit of benefit -- it is effectively adding functional programming to these languages. ( Here's the progression I've seen a few times (although this is mainly C# -- VB is limited in the current version).
The obvious start is that anything related to list processing gets vastly easier. A lot of loops can just go away. What benefit do you get? You'll start programming more declaratively, which will lead to fewer bugs. Things start to "just work" when switching to this style. (The LINQ query syntax I don't find too useful, unless the queries are very complicated with lots of intermediate values. In these cases, the syntax will sort out all the issues you'd otherwise have to pass tuples around for.)
Next, language support (in C#, and in the next version of VB) for anonymous methods allows you to write a lot more constructs in a much shorter way. For instance, handling an async callback can be defined inside the method that initiates it. Using a closure here will result in you not having to bundle up state into an opaque object parameter and casting it out later on.
Being able to use higher order functions gets you thinking much more generically. So you'll start to see where you could simply pass in a lambda and solve things neater and cleaner. At this point, you'll realise that things only really work if you use generics. Sure, this is a 2.0 feature, but the usage is much more prevalent when you're passing functions around.
And around there, you get into the point of diminishing returns. The cost of declaring and using funcs and declaring all the generic type parameters (in C# and VB) is quite high. The compiler won't work it out for you, so you have to do it all manually. This adds a huge amount of overhead and friction, which limits how far you can go.
So, is this all "LINQ"? Depends on marketing, perhaps. The LINQ push made this style of programming much easier in C#, and all of LINQ is based on FP ideas.
Related
Given an "impure" (i.e., side-effect-producing) function such as this:
public MyResultClass DoWork(MyEntityClass myEntity)
{
if (myEntity.DateUpdated == default)
myEntity.DateUpdated = DateTime.Now;
MyResultClass result = _dbUtility.InsertOrUpdate(myEntity);
return result;
}
AFAIK, this would not be a pure function since it has side effects - i.e. it modifies the input object (using a non-deterministic value, no less!) and performs an insert/update operation on a database.
However, with Linq - which is (arguably) regarded as a purely functional subset of C# - the following is legal, and works as would be expected from a "non-pure-functional" context:
IEnumerable<MyResultClass> = myIEnumerableEntities.Select(entity => DoWork(entity));
Meaning that (1) myObject's DateUpdated property is modified (if default) and (2), the database insert/update happens successfully.
Does this indicate that Linq is not truly a functional sublanguage? Or am I misunderstanding the concepts of pure functions and functional programming? (At any rate, I still plan to use this pattern unless there are best-practice reasons for me to avoid it; and if there are such reasons not to, I will certainly consider those).
C# makes no explicit distinction between pure functions and impure actions. Thus, with higher-order functions such as Select or SelectMany, you can pass impure actions as arguments. Nothing in the language prevents that. LINQ cannot, for that reason, be considered a strictly functional subset of C#.
What is true is that C# query syntax is based on the concept of functors and monads, which were first popularised in the strictly functional language Haskell. Haskell also famously uses a particular monad called IO to model impure actions, so in that case you could argue that performing side-effects inside of a SelectMany (monadic bind) call would be, in some sense, idiomatic. The monad to use, however, ought to be the asynchronous monad rather than the List monad.
It's not considered idiomatic to perform side effects inside of IEnumerable<T>.Select. Use foreach instead.
Add ToList() at the end of the query. But REMEMBER, standard LINQ for
querying data, not modification. Use foreach instead. – Svyatoslav
Danyliv
LINQ is intended to be used in a functional manner, but there is
nothing in the semantics that mandates it. C# remains an imperative
language at its core and there is nothing to express that a method or
delegate is free of side effects. The reasons you should want to avoid
side effects is that they make the proper functioning of code much
harder to reason about, potentially break things if what you're
modifying is what's being queried in the first place, inhibit
parallelism (which is a shame, since PLINQ happens to be very
convenient) and subvert the expectations of the reader in queries. –
Jeroen Mostert
#svyatoslav-danyliv and #jeroen-mostert have answered my question. Thanks!
The project I'm working currently on has a way to define a filter on objects from a database.
This filter is a pretty straightforward class containing criteria that will be combined to produce a SQL where clause.
The goal now is to use this class to filter .Net objects as well. So for example the filter might specify that the title property of the object that it is applied to must contain some user-defined string.
What are ways to approach this problem? What should the filter return instead of the sql where-clause and how can it be applied to the object? I've been think about this for hours and don´t yet have even a slight idea how to solve this. Been thinking about reflection, dynamic code execution, building expressions but still haven´t found a starting point.
It depends on what all the use-cases are :)
If your DB code could use LINQ, then I might consider returning an Expression<Func<YourEntity,bool>>, as this can be applied via .Where to both DB-based LINQ and LINQ-to-Objects - in the latter case simply by calling:
var filtered = original.AsQueryable().Where(filter);
(DB code is typically already queryable, so no AsQueryable is needed).
One bad thing about this is that there is no guarantee that a particular query expression will work on every LINQ provider.
Alternatively, you would have to write either your own basic query language, or a basic tree-structure. And then write code to translate to both property-reflection and SQL. Lots of work.
One final thought is to just have two different query results, and keep the two thigs separate. So one SQL, one Predicate<T> or similar. There is a risk of them being out of sync, though.
If you want to do it a bit like the dynamically created SQL where clause, you could use Dynamic LINQ to achieve similar effects.
To use initialization syntax like this:
var contacts = new ContactList
{
{ "Dan", "dan.tao#email.com" },
{ "Eric", "ceo#google.com" }
};
...my understanding is that my ContactList type would need to define an Add method that takes two string parameters:
public void Add(string name, string email);
What's a bit confusing to me about this is that the { } initializer syntax seems most useful when creating read-only or fixed-size collections. After all it is meant to mimic the initialization syntax for an array, right? (OK, so arrays are not read-only; but they are fixed size.) And naturally it can only be used when the collection's contents are known (at least the number of elements) at compile-time.
So it would almost seem that the main requirement for using this collection initializer syntax (having an Add method and therefore a mutable collection) is at odds with the typical case in which it would be most useful.
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Am I way off base here? Is the desire to use the { } syntax to initialize fixed-size collections not as common as I think? What other factors might have influenced the formulation of the requirements for this syntax that I'm simply not thinking of?
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Your analysis is very good; the key problem is the last three words in the statement above. What are the actual typical usage scenarios?
The by-design goal motivated by typical usage scenarios for collection initializers was to make initialization of existing collection types possible in an expression syntax so that collection initializers could be embedded in query comprehensions or converted to expression trees.
Every other scenario was lower priority; the feature exists at all because it helps make LINQ work.
The C# 3 compiler team was the "long pole" for that release of Visual Studio / .NET - we had the most days of work on the schedule of any team, which meant that every day we delayed, the product would be delayed. We wanted to ship a quality product on time for all of you guys, and the perfect is the enemy of the good. Yes, this feature is slightly clunky and doesn't do absolutely everything you might want it to, but it was more important to get it solid and tested for LINQ than to make it work for a bunch of immutable collection types that largely didn't even exist.
Had this feature been designed into the language from day one, while the frameworks types were still evolving, I'm sure that things would have gone differently. As we've discussed elsewhere on this site, I would dearly love to have a write-once-read-many fixed size array of values. It would be nice to define a common pattern for proffering up a bunch of state to initialize an arbitrary immutable collection. You are right that the collection initializer syntax would be ideal for such a thing.
Features like that are on the list for potential future hyptothetical language versions, but not real high on the list. In other words, let's get async/await right first before we think too hard about syntactic sugars for immutable collection initialization.
It's because the initialization statement is shorthand for the CLR. When it gets compiled into bytecode, it will call the Add method you've defined.
So you can make the case that this initialization statement is not really a "first class" feature, because it doesn't have a counterpart in IL. But that's the case for quite a lot of what we use, the "using" statement for example.
The reason for this is that it was retrofitted. I agree with you that using a constructor taking a collection would make vastly more sense, but not all of the existing collection classes implemented this and the change should (1) work with all existing collections, (2) not change the existing classes in any way.
It’s a compromise.
The main reason is Syntactic Sugar.
The initializer syntax only makes writing programing in C# a bit easier. It doesn't actually add any expressive power to the language.
If the initializer didn't require an Add() method, then it would be a much different feature than it is now. Basically, it's just not how C# works. There is no literal form for creating general collections.
Not an answer, strictly speaking, but if you want to know what sort of things influenced the design of collection initialisers then you'll probably find this interesting:
What is a collection? [straight from the Horse's mouth Mads Torgersen]
What should the initialization syntax use, if not an Add method? The initialization syntax is 'run' after the constructor of the collection is run, and the collection fully created. There must be some way of adding items to the collection after it's been created.
If you want to initialize a read-only collection, do it in the constructor (taking a T[] items argument or similar)
As far as I understand it, the collection initializer syntax is just syntactic sugar with no special tricks in it. It was designed in part to support initializing collections inside Linq queries:
from a in somewhere
select new {
Something = a.Something
Collection = new List<object>() {
a.Item1,
a.Item2,
...
}
}
Before there was no way to do this inline and you'd have to do it after the case, which was annoying.
I'd love to have the initializer syntax for immutable types(both collections and normal types). I think this could be implemented with a special constructor overload using a syntax similar to params.
For example something like this:
MyClass(initializer KeyValuePair<K,V>[] initialValues)
But unfortunately the C# team didn't implement such a thing yet :(
So we need to use a workaround like
MyClass(new KeyValuePair<K,V>[]{...})
for now
Collection initializers are expressions, so they can be used where only expression are valid, such as a field initializer or LINQ query. This makes their existence very useful.
I also think the curly-bracketed { } kind of initialization, smells more like a fixed size collection, but it's just a syntax choice.
I've been reading Jon Skeet's C# In Depth: Second Edition and I noticed something slightly different in one of his examples from something I do myself.
He has something similar to the following:
var item = someObject.Where(user => user.Id == Id).Single();
Whereas I've been doing the following:
var item = someObject.Single(user => user.Id == Id);
Is there any real difference between the two? I know Jon Skeet is pretty much the c# god so I tend to think his knowledge in this area is better than mine so I might be misunderstanding something here. Hope someone can help.
The queries should be equal when the tree is evaluated, however depending on the target the actual execution could differ (IE L2S optimization).
I typically think in terms of "filter then take a single value". On the other hand, when it comes to Count I'll often use the version with the predicate. For Any I can go either way pretty easily depending on my mood :)
Ultimately it's unlikely to make any significant difference - use whichever is easier to understand at the time. I certainly wasn't trying to avoid using the versions with predicates etc - and I give some examples using them on P499.
While there are some situations with "definitely more readable" versions, many other cases are pretty much equally readable either way.
The only thing I can notice is that the IL produced using the Where method is one row bigger than the second example (tested through LinqPAD).
IL_0042: call System.Linq.Enumerable.Where
IL_0047: call System.Linq.Enumerable.Single
instead of a single call to System.Linq.Enumerable.Single
IL_006B: call System.Linq.Enumerable.Single
Specifically for Linq2Objects, I personally prefer using the Enumerable.Single(this, predicate) version directly, simply because adding the Enumerable.Where will introduce an additional enumeration which Enumerable.Single will pull the data from.
By using Enumerable.Single directly on the primary enumerable, no additional enumerations are started. I have never benchmarked this to evaluate the real performance impact, but why execute more code when you don't need to right? Especially if readability is not impacted, of course this last point is rather subjective.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A lot of questions are being answered on Stack Overflow, with members specifying how to solve these real world/time problems using lambda expressions.
Are we overusing it, and are we considering the performance impact of using lambda expressions?
I found a few articles that explores the performance impact of lambda vs anonymous delegates vs for/foreach loops with different results
Anonymous Delegates vs Lambda Expressions vs Function Calls Performance
Performance of foreach vs. List.ForEach
.NET/C# Loop Performance Test (FOR, FOREACH, LINQ, & Lambda).
DataTable.Select is faster than LINQ
What should be the evaluation criteria when choosing the appropriate solution? Except for the obvious reason that it's more concise code and readable when using lambda.
Even though I will focus on point one, I begin by giving my 2 cents on the whole issue of performance. Unless differences are big or usage is intensive, usually I don't bother about microseconds that when added don't amount to any visible difference to the user. I emphasize that I only don't care when considering non-intensive called methods. Where I do have special performance considerations is on the way I design the application itself. I care about caching, about the use of threads, about clever ways to call methods (whether to make several calls or to try to make only one call), whether to pool connections or not, etc., etc. In fact I usually don't focus on raw performance, but on scalibility. I don't care if it runs better by a tiny slice of a nanosecond for a single user, but I care a lot to have the ability to load the system with big amounts of simultaneous users without noticing the impact.
Having said that, here goes my opinion about point 1. I love anonymous methods. They give me great flexibility and code elegance. The other great feature about anonymous methods is that they allow me to directly use local variables from the container method (from a C# perspective, not from an IL perspective, of course). They spare me loads of code oftentimes. When do I use anonymous methods? Evey single time the piece of code I need isn't needed elsewhere. If it is used in two different places, I don't like copy-paste as a reuse technique, so I'll use a plain ol' delegate. So, just like shoosh answered, it isn't good to have code duplication. In theory there are no performance differences as anonyms are C# tricks, not IL stuff.
Most of what I think about anonymous methods applies to lambda expressions, as the latter can be used as a compact syntax to represent anonymous methods. Let's assume the following method:
public static void DoSomethingMethod(string[] names, Func<string, bool> myExpression)
{
Console.WriteLine("Lambda used to represent an anonymous method");
foreach (var item in names)
{
if (myExpression(item))
Console.WriteLine("Found {0}", item);
}
}
It receives an array of strings and for each one of them, it will call the method passed to it. If that method returns true, it will say "Found...". You can call this method the following way:
string[] names = {"Alice", "Bob", "Charles"};
DoSomethingMethod(names, delegate(string p) { return p == "Alice"; });
But, you can also call it the following way:
DoSomethingMethod(names, p => p == "Alice");
There is no difference in IL between the both, being that the one using the Lambda expression is much more readable. Once again, there is no performance impact as these are all C# compiler tricks (not JIT compiler tricks). Just as I didn't feel we are overusing anonymous methods, I don't feel we are overusing Lambda expressions to represent anonymous methods. Of course, the same logic applies to repeated code: Don't do lambdas, use regular delegates. There are other restrictions leading you back to anonymous methods or plain delegates, like out or ref argument passing.
The other nice things about Lambda expressions is that the exact same syntax doesn't need to represent an anonymous method. Lambda expressions can also represent... you guessed, expressions. Take the following example:
public static void DoSomethingExpression(string[] names, System.Linq.Expressions.Expression<Func<string, bool>> myExpression)
{
Console.WriteLine("Lambda used to represent an expression");
BinaryExpression bExpr = myExpression.Body as BinaryExpression;
if (bExpr == null)
return;
Console.WriteLine("It is a binary expression");
Console.WriteLine("The node type is {0}", bExpr.NodeType.ToString());
Console.WriteLine("The left side is {0}", bExpr.Left.NodeType.ToString());
Console.WriteLine("The right side is {0}", bExpr.Right.NodeType.ToString());
if (bExpr.Right.NodeType == ExpressionType.Constant)
{
ConstantExpression right = (ConstantExpression)bExpr.Right;
Console.WriteLine("The value of the right side is {0}", right.Value.ToString());
}
}
Notice the slightly different signature. The second parameter receives an expression and not a delegate. The way to call this method would be:
DoSomethingExpression(names, p => p == "Alice");
Which is exactly the same as the call we made when creating an anonymous method with a lambda. The difference here is that we are not creating an anonymous method, but creating an expression tree. It is due to these expression trees that we can then translate lambda expressions to SQL, which is what Linq 2 SQL does, for instance, instead of executing stuff in the engine for each clause like the Where, the Select, etc. The nice thing is that the calling syntax is the same whether you're creating an anonymous method or sending an expression.
My answer will not be popular.
I believe Lambda's are 99% always the better choice for three reasons.
First, there is ABSOLUTELY nothing wrong with assuming your developers are smart. Other answers have an underlying premise that every developer but you is stupid. Not so.
Second, Lamdas (et al) are a modern syntax - and tomorrow they will be more commonplace than they already are today. Your project's code should flow from current and emerging conventions.
Third, writing code "the old fashioned way" might seem easier to you, but it's not easier to the compiler. This is important, legacy approaches have little opportunity to be improved as the compiler is rev'ed. Lambdas (et al) which rely on the compiler to expand them can benefit as the compiler deals with them better over time.
To sum up:
Developers can handle it
Everyone is doing it
There's future potential
Again, I know this will not be a popular answer. And believe me "Simple is Best" is my mantra, too. Maintenance is an important aspect to any source. I get it. But I think we are overshadowing reality with some cliché rules of thumb.
// Jerry
Code duplication.
If you find yourself writing the same anonymous function more than once, it shouldn't be one.
Well, when we are talking bout delegate usage, there shouldn't be any difference between lambda and anonymous methods -- they are the same, just with different syntax. And named methods (used as delegates) are also identical from the runtime's viewpoint. The difference, then, is between using delegates, vs. inline code - i.e.
list.ForEach(s=>s.Foo());
// vs.
foreach(var s in list) { s.Foo(); }
(where I would expect the latter to be quicker)
And equally, if you are talking about anything other than in-memory objects, lambdas are one of your most powerful tools in terms of maintaining type checking (rather than parsing strings all the time).
Certainly, there are cases when a simple foreach with code will be faster than the LINQ version, as there will be fewer invokes to do, and invokes cost a small but measurable time. However, in many cases, the code is simply not the bottleneck, and the simpler code (especially for grouping, etc) is worth a lot more than a few nanoseconds.
Note also that in .NET 4.0 there are additional Expression nodes for things like loops, commas, etc. The language doesn't support them, but the runtime does. I mention this only for completeness: I'm certainly not saying you should use manual Expression construction where foreach would do!
I'd say that the performance differences are usually so small (and in the case of loops, obviously, if you look at the results of the 2nd article (btw, Jon Skeet has a similar article here)) that you should almost never choose a solution for performance reasons alone, unless you are writing a piece of software where performance is absolutely the number one non-functional requirement and you really have to do micro-optimalizations.
When to choose what? I guess it depends on the situation but also the person. Just as an example, some people perfer List.Foreach over a normal foreach loop. I personally prefer the latter, as it is usually more readable, but who am I to argue against this?
Rules of thumb:
Write your code to be natural and readable.
Avoid code duplications (lambda expressions might require a little extra diligence).
Optimize only when there's a problem, and only with data to back up what that problem actually is.
Any time the lambda simply passes its arguments directly to another function. Don't create a lambda for function application.
Example:
var coll = new ObservableCollection<int>();
myInts.ForEach(x => coll.Add(x))
Is nicer as:
var coll = new ObservableCollection<int>();
myInts.ForEach(coll.Add)
The main exception is where C#'s type inference fails for whatever reason (and there are plenty of times that's true).
If you need recursion, don't use lambdas, or you'll end up getting very distracted!
Lambda expressions are cool. Over older delegate syntax they have a few advantages like, they can be converted to either anonymous function or expression trees, parameter types are inferred from the declaration, they are cleaner and more concise, etc. I see no real value to not use lambda expression when you're in need of an anonymous function. One not so big advantage the earlier style has is that you can omit the parameter declaration totally if they are not used. Like
Action<int> a = delegate { }; //takes one argument, but no argument specified
This is useful when you have to declare an empty delegate that does nothing, but it is not a strong reason enough to not use lambdas.
Lambdas let you write quick anonymous methods. Now that makes lambdas meaningless everywhere where anonymous methods go meaningless, ie where named methods make more sense. Over named methods, anonymous methods can be disadvantageous (not a lambda expression per se thing, but since these days lambdas widely represent anonymous methods it is relevant):
because it tend to lead to logic duplication (often does, reuse is difficult)
when it is unnecessary to write to one, like:
//this is unnecessary
Func<string, int> f = x => int.Parse(x);
//this is enough
Func<string, int> f = int.Parse;
since writing anonymous iterator block is impossible.
Func<IEnumerable<int>> f = () => { yield return 0; }; //impossible
since recursive lambdas require one more line of quirkiness, like
Func<int, int> f = null;
f = x => (x <= 1) ? 1 : x * f(x - 1);
well, since reflection is kinda messier, but that is moot isn't it?
Apart from point 3, the rest are not strong reasons not to use lambdas.
Also see this thread about what is disadvantageous about Func/Action delegates, since often they are used along with lambda expressions.