Will Microsoft ever make all collections useable by LINQ? - c#

I've been using LINQ for awhile (and enjoy it), but it feels like I hit a speedbump when I run across .NET specialized collections(DataRowCollection, ControlCollection). Is there a way to use LINQ with these specialized controls, and if not do you think Microsoft will address this in the next release of the framework? Or are we left to iterate over these the non-LINQ way, or pull the items out of the collection into LINQ-able collections ourselves?

The reason why collections like ControlCollection do not work with LINQ is that they are not strongly typed. Without an element type LINQ cannot create strongly typed methods. As long as you know the type you can use the Cast method to create a strongly typed enumeration and hence be used with LINQ. For example
ControlCollection col = ...
var query = col.Cast<Control>().Where(x => ...);
As to will Microsoft ever make these implement IEnumerable<T> by default. My guess is no here. The reason why is that doing so is a breaking change and can cause expected behavior in code. Even simply implementing IEnumerable<Control> for ControlCollection would cause changes to overload resolution that can, and almost certainly will, break user applications.

You should be able to do something like this:
myDataRowCollection.Cast<DataRow>().Where.....
and use Linq that way. If you know what the objects in the collection are, then you should be able to use that.

The reason for this is: Collections which do not implement IEnumerable<T> or IQueryable, can not be iterated in LINQ

Related

Why do so many named collections in .NET not implement IEnumerable<T>?

Random example:
ConfigurationElementCollection
.Net has tons of these little WhateverCollection classes that don't implement IEnumerable<T>, which means I can't use Linq to objects with them out of the box.
Even before Linq, you'd think they would have wanted to make use of generics (which were introduced all the way back in C# 2 I believe)
It seems I run across these annoying little collection types all the time.
Is there some technical reason?
The answer is in the question title: "named collections". Which is the way you had to make collections type-safe before generics became available. There are a lot of them in code that dates back to .NET 1.x, especially Winforms. There was no reasonable way to rewrite them using generics, that would have broken too much existing code.
So the named collection type is type safe but the rub is System.Collections.IEnumerator.Current, a property of type Object. You can Linqify these collections by using OfType() or Cast().
As Adam Houldsworth said in a comment already, you simply need to use the Cast<> method.
Example:
var a = new DogCollection();
var allFidos = a.Cast<Dog>().Where(d => d.Name == "Fido");

Why is an Add method required for { } initialization?

To use initialization syntax like this:
var contacts = new ContactList
{
{ "Dan", "dan.tao#email.com" },
{ "Eric", "ceo#google.com" }
};
...my understanding is that my ContactList type would need to define an Add method that takes two string parameters:
public void Add(string name, string email);
What's a bit confusing to me about this is that the { } initializer syntax seems most useful when creating read-only or fixed-size collections. After all it is meant to mimic the initialization syntax for an array, right? (OK, so arrays are not read-only; but they are fixed size.) And naturally it can only be used when the collection's contents are known (at least the number of elements) at compile-time.
So it would almost seem that the main requirement for using this collection initializer syntax (having an Add method and therefore a mutable collection) is at odds with the typical case in which it would be most useful.
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Am I way off base here? Is the desire to use the { } syntax to initialize fixed-size collections not as common as I think? What other factors might have influenced the formulation of the requirements for this syntax that I'm simply not thinking of?
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Your analysis is very good; the key problem is the last three words in the statement above. What are the actual typical usage scenarios?
The by-design goal motivated by typical usage scenarios for collection initializers was to make initialization of existing collection types possible in an expression syntax so that collection initializers could be embedded in query comprehensions or converted to expression trees.
Every other scenario was lower priority; the feature exists at all because it helps make LINQ work.
The C# 3 compiler team was the "long pole" for that release of Visual Studio / .NET - we had the most days of work on the schedule of any team, which meant that every day we delayed, the product would be delayed. We wanted to ship a quality product on time for all of you guys, and the perfect is the enemy of the good. Yes, this feature is slightly clunky and doesn't do absolutely everything you might want it to, but it was more important to get it solid and tested for LINQ than to make it work for a bunch of immutable collection types that largely didn't even exist.
Had this feature been designed into the language from day one, while the frameworks types were still evolving, I'm sure that things would have gone differently. As we've discussed elsewhere on this site, I would dearly love to have a write-once-read-many fixed size array of values. It would be nice to define a common pattern for proffering up a bunch of state to initialize an arbitrary immutable collection. You are right that the collection initializer syntax would be ideal for such a thing.
Features like that are on the list for potential future hyptothetical language versions, but not real high on the list. In other words, let's get async/await right first before we think too hard about syntactic sugars for immutable collection initialization.
It's because the initialization statement is shorthand for the CLR. When it gets compiled into bytecode, it will call the Add method you've defined.
So you can make the case that this initialization statement is not really a "first class" feature, because it doesn't have a counterpart in IL. But that's the case for quite a lot of what we use, the "using" statement for example.
The reason for this is that it was retrofitted. I agree with you that using a constructor taking a collection would make vastly more sense, but not all of the existing collection classes implemented this and the change should (1) work with all existing collections, (2) not change the existing classes in any way.
It’s a compromise.
The main reason is Syntactic Sugar.
The initializer syntax only makes writing programing in C# a bit easier. It doesn't actually add any expressive power to the language.
If the initializer didn't require an Add() method, then it would be a much different feature than it is now. Basically, it's just not how C# works. There is no literal form for creating general collections.
Not an answer, strictly speaking, but if you want to know what sort of things influenced the design of collection initialisers then you'll probably find this interesting:
What is a collection? [straight from the Horse's mouth Mads Torgersen]
What should the initialization syntax use, if not an Add method? The initialization syntax is 'run' after the constructor of the collection is run, and the collection fully created. There must be some way of adding items to the collection after it's been created.
If you want to initialize a read-only collection, do it in the constructor (taking a T[] items argument or similar)
As far as I understand it, the collection initializer syntax is just syntactic sugar with no special tricks in it. It was designed in part to support initializing collections inside Linq queries:
from a in somewhere
select new {
Something = a.Something
Collection = new List<object>() {
a.Item1,
a.Item2,
...
}
}
Before there was no way to do this inline and you'd have to do it after the case, which was annoying.
I'd love to have the initializer syntax for immutable types(both collections and normal types). I think this could be implemented with a special constructor overload using a syntax similar to params.
For example something like this:
MyClass(initializer KeyValuePair<K,V>[] initialValues)
But unfortunately the C# team didn't implement such a thing yet :(
So we need to use a workaround like
MyClass(new KeyValuePair<K,V>[]{...})
for now
Collection initializers are expressions, so they can be used where only expression are valid, such as a field initializer or LINQ query. This makes their existence very useful.
I also think the curly-bracketed { } kind of initialization, smells more like a fixed size collection, but it's just a syntax choice.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

Where and When to use LINQ to Objects?

In which situations I should use LINQ to Objects?
Obviously I can do everything without LINQ. So in which operations LINQ actually helps me to code shorter and/or more readable?
This question triggered by this
I find LINQ to Objects useful all over the place. The problem it solves is pretty general:
You have a collection of some data items
You want another collection, formed from the original collection, but after some sort of transformation or filtering. This might be sorting, projection, applying a predicate, grouping, etc.
That's a situation I come across pretty often. There are an awful lot of areas of programming which basically involve transforming one collection (or stream of data) into another. In those cases the code using LINQ is almost always shorter and more readable. I'd like to point out that LINQ shouldn't be regarded as being synonymous with query expressions - if only a single operator is required, the normal "dot notation" (using extension methods) can often be shorter and more readable.
One of the reasons I particularly like LINQ to Objects is that it is so general - whereas LINQ to SQL is likely to only get involved in your data layer (or pretty much become the data layer), LINQ to Objects is applicable in every layer, and in all kinds of applications.
Just as an example, here's a line in my MiniBench benchmarking framework, converting a TestSuite (which is basically a named collection of tests) into a ResultSuite (a named collection of results):
return new ResultSuite(name,
tests.Select(test => test.Run(input, expectedOutput)));
Then again if a ResultSuite needs to be scaled against some particular "standard" result:
return new ResultSuite(name,
results.Select(x => x.ScaleToStandard(standard, mode)));
It wouldn't be hard to write this code without LINQ, but LINQ just makes it clearer and lets you concentrate on the real "logic" instead of the details of iterating through loops and adding results to lists etc.
Even when LINQ itself isn't applicable, some of the features which were largely included for the sake of LINQ (e.g. implicitly typed local variables, lambda expressions, extension methods) can be very useful.
The answer practically everywhere comes to mind. A better question would be when not to use it.
LINQ is great for the "slippery slope". Think of what's involved in many common operations:
Where. Just write a foreach loop and an "if"
Select. Create an empty list of the target type, loop through the originals, convert each one and add it to the results.
OrderBy. Just add it to a list and call .Sort(). Or implement a bubble sort ;)
ThenBy (from order by PropertyA, then by PropertyB). Quite a bit harder. A custom comparer and Sort should do the trick.
GroupBy - create a Dictionary<key, List<value>> and loop through all items. If no key exists create it, then add items to the appropriate list.
In each of those cases, the procedural way takes more code than the LINQ way. In the case of "if" it's a couple of lines more; in the case of GroupBy or OrderBy/ThenBy it's a lot more.
Now take an all too common scenario of combining them together. You're suddenly looking at a 10-20 line method which could be solved with 3-4 lines in LINQ. And the LINQ version is guaranteed to be easier to read (once you are familiar with LINQ).
So when do you use LINQ? My answer: whenever you see "foreach" :)
LINQ is pretty useful in a few scenarios:
You want to use typed "business entities", instead of data tables, to more naturally access your data (and aren't already using something like NHibernate or LLBLGenPro)
You want to query non-relational data using a SQL like syntax (this is real handy when querying lists and such)
You don't like lots of inline SQL or stored procedures
LINQ comes in to play when you start doing complex filtering on complex data types. For example, if you're given a list of People objects and you need to gather a list of all the doctors within that list. With LINQ, you can compress the following code into a single LINQ statement:
(pseudo-code)
doctors = []
for person in people:
if person is doctor:
doctors.append(person)
(sorry, my C# is rusty, type checking syntax is probably incorrect, but you get the idea)
doctors = from person in people where person.type() == doctor select person;
Edit: After I answered I see a change to say "LINQ to Objects". Oh well.
If by LINQ we refer to all the new types in System.Linq, as well as new compiler features, then it'll have quite a bit of benefit -- it is effectively adding functional programming to these languages. ( Here's the progression I've seen a few times (although this is mainly C# -- VB is limited in the current version).
The obvious start is that anything related to list processing gets vastly easier. A lot of loops can just go away. What benefit do you get? You'll start programming more declaratively, which will lead to fewer bugs. Things start to "just work" when switching to this style. (The LINQ query syntax I don't find too useful, unless the queries are very complicated with lots of intermediate values. In these cases, the syntax will sort out all the issues you'd otherwise have to pass tuples around for.)
Next, language support (in C#, and in the next version of VB) for anonymous methods allows you to write a lot more constructs in a much shorter way. For instance, handling an async callback can be defined inside the method that initiates it. Using a closure here will result in you not having to bundle up state into an opaque object parameter and casting it out later on.
Being able to use higher order functions gets you thinking much more generically. So you'll start to see where you could simply pass in a lambda and solve things neater and cleaner. At this point, you'll realise that things only really work if you use generics. Sure, this is a 2.0 feature, but the usage is much more prevalent when you're passing functions around.
And around there, you get into the point of diminishing returns. The cost of declaring and using funcs and declaring all the generic type parameters (in C# and VB) is quite high. The compiler won't work it out for you, so you have to do it all manually. This adds a huge amount of overhead and friction, which limits how far you can go.
So, is this all "LINQ"? Depends on marketing, perhaps. The LINQ push made this style of programming much easier in C#, and all of LINQ is based on FP ideas.

Having some confusion with LINQ

Some background info;
LanguageResource is the base class
LanguageTranslatorResource and LanguageEditorResource inherit from LanguageResource
LanguageEditorResource defines an IsDirty property
LanguageResourceCollection is a collection of LanguageResource
LanguageResourceCollection internally holds LanguageResources in Dictionary<string, LanguageResource> _dict
LanguageResourceCollection.GetEnumerator() returns _dict.Values.GetEnumerator()
I have a LanguageResourceCollection _resources that contains only LanguageEditorResource objects and want to use LINQ to enumerate those that are dirty so I have tried the following. My specific questions are in bold.
_resources.Where(r => (r as LanguageEditorResource).IsDirty)
neither Where not other LINQ methods are displayed by Intellisense but I code it anyway and am told "LanguageResourceCollection does not contain a definition for 'Where' and no extension method...".
Why does the way that LanguageResourceCollection implements IEnumerable preclude it from supporting LINQ?
If I change the query to
(_resources as IEnumerable<LanguageEditorResource>).Where(r => r.IsDirty)
Intellisense displays the LINQ methods and the solution compiles. But at runtime I get an ArgumentNullException "Value cannot be null. Parameter name: source".
Is this a problem in my LINQ code?
Is it a problem with the general design of the classes?
How can I dig into what LINQ generates to try and see what the problem is?
My aim with this question is not to get a solution for the specific problem, as I will have to solve it now using other (non LINQ) means, but rather to try and improve my understanding of LINQ and learn how I can improve the design of my classes to work better with LINQ.
It sounds like your collection implements IEnumerable, not IEnumerable<T>, hence you need:
_resources.Cast<LanguageEditorResource>().Where(r => r.IsDirty)
Note that Enumerable.Where is defined on IEnumerable<T>, not IEnumerable - if you have the non-generic type, you need to use Cast<T> (or OfType<T>) to get the right type. The difference being that Cast<T> will throw an exception if it finds something that isn't a T, where-as OfType<T> simply ignores anything that isn't a T. Since you've stated that your collection only contains LanguageEditorResource, it is reasonable to check that assumption using Cast<T>, rather than silently drop data.
Check also that you have "using System.Linq" (and are referencing System.Core (.NET 3.5; else LINQBridge with .NET 2.0) to get the Where extension method(s).
Actually, it would be worth having your collection implement IEnumerable<LanguageResource> - which you could do quite simply using either the Cast<T> method, or an iterator block (yield return).
[edit]
To build on Richard Poole's note - you could write your own generic container here, presumably with T : LanguageResource (and using that T in the Dictionary<string,T>, and implementing IEnumerable<T> or ICollection<T>). Just a thought.
In addition to Marc G's answer, and if you're able to do so, you might want to consider dropping your custom LanguageResourceCollection class in favour of a generic List<LanguageResource>. This will solve your current problem and get rid of that nasty .NET 1.1ish custom collection.
How can I dig into what LINQ generates to try and see what the problem is?
Linq isn't generating anything here. You can step through with the debugger.
to try and improve my understanding of LINQ and learn how I can improve the design of my classes to work better with LINQ.
System.Linq.Enumerable methods rely heavily on the IEnumerable< T > contract. You need to understand how your class can produce targets that support this contract. The type that T represents is important!
You could add this method to LanguageResourceCollection:
public IEnumerable<T> ParticularResources<T>()
{
return _dict.Values.OfType<T>();
}
and call it by:
_resources
.ParticularResources<LanguageEditorResource>()
.Where(r => r.IsDirty)
This example would make more sense if the collection class didn't implement IEnumerable< T > against that same _dict.Values . The point is to understand IEnumerable < T > and generic typing.

Categories