To use initialization syntax like this:
var contacts = new ContactList
{
{ "Dan", "dan.tao#email.com" },
{ "Eric", "ceo#google.com" }
};
...my understanding is that my ContactList type would need to define an Add method that takes two string parameters:
public void Add(string name, string email);
What's a bit confusing to me about this is that the { } initializer syntax seems most useful when creating read-only or fixed-size collections. After all it is meant to mimic the initialization syntax for an array, right? (OK, so arrays are not read-only; but they are fixed size.) And naturally it can only be used when the collection's contents are known (at least the number of elements) at compile-time.
So it would almost seem that the main requirement for using this collection initializer syntax (having an Add method and therefore a mutable collection) is at odds with the typical case in which it would be most useful.
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Am I way off base here? Is the desire to use the { } syntax to initialize fixed-size collections not as common as I think? What other factors might have influenced the formulation of the requirements for this syntax that I'm simply not thinking of?
I'm sure I haven't put as much thought into this matter as the C# design team; it just seems that there could have been different rules for this syntax that would have meshed better with its typical usage scenarios.
Your analysis is very good; the key problem is the last three words in the statement above. What are the actual typical usage scenarios?
The by-design goal motivated by typical usage scenarios for collection initializers was to make initialization of existing collection types possible in an expression syntax so that collection initializers could be embedded in query comprehensions or converted to expression trees.
Every other scenario was lower priority; the feature exists at all because it helps make LINQ work.
The C# 3 compiler team was the "long pole" for that release of Visual Studio / .NET - we had the most days of work on the schedule of any team, which meant that every day we delayed, the product would be delayed. We wanted to ship a quality product on time for all of you guys, and the perfect is the enemy of the good. Yes, this feature is slightly clunky and doesn't do absolutely everything you might want it to, but it was more important to get it solid and tested for LINQ than to make it work for a bunch of immutable collection types that largely didn't even exist.
Had this feature been designed into the language from day one, while the frameworks types were still evolving, I'm sure that things would have gone differently. As we've discussed elsewhere on this site, I would dearly love to have a write-once-read-many fixed size array of values. It would be nice to define a common pattern for proffering up a bunch of state to initialize an arbitrary immutable collection. You are right that the collection initializer syntax would be ideal for such a thing.
Features like that are on the list for potential future hyptothetical language versions, but not real high on the list. In other words, let's get async/await right first before we think too hard about syntactic sugars for immutable collection initialization.
It's because the initialization statement is shorthand for the CLR. When it gets compiled into bytecode, it will call the Add method you've defined.
So you can make the case that this initialization statement is not really a "first class" feature, because it doesn't have a counterpart in IL. But that's the case for quite a lot of what we use, the "using" statement for example.
The reason for this is that it was retrofitted. I agree with you that using a constructor taking a collection would make vastly more sense, but not all of the existing collection classes implemented this and the change should (1) work with all existing collections, (2) not change the existing classes in any way.
It’s a compromise.
The main reason is Syntactic Sugar.
The initializer syntax only makes writing programing in C# a bit easier. It doesn't actually add any expressive power to the language.
If the initializer didn't require an Add() method, then it would be a much different feature than it is now. Basically, it's just not how C# works. There is no literal form for creating general collections.
Not an answer, strictly speaking, but if you want to know what sort of things influenced the design of collection initialisers then you'll probably find this interesting:
What is a collection? [straight from the Horse's mouth Mads Torgersen]
What should the initialization syntax use, if not an Add method? The initialization syntax is 'run' after the constructor of the collection is run, and the collection fully created. There must be some way of adding items to the collection after it's been created.
If you want to initialize a read-only collection, do it in the constructor (taking a T[] items argument or similar)
As far as I understand it, the collection initializer syntax is just syntactic sugar with no special tricks in it. It was designed in part to support initializing collections inside Linq queries:
from a in somewhere
select new {
Something = a.Something
Collection = new List<object>() {
a.Item1,
a.Item2,
...
}
}
Before there was no way to do this inline and you'd have to do it after the case, which was annoying.
I'd love to have the initializer syntax for immutable types(both collections and normal types). I think this could be implemented with a special constructor overload using a syntax similar to params.
For example something like this:
MyClass(initializer KeyValuePair<K,V>[] initialValues)
But unfortunately the C# team didn't implement such a thing yet :(
So we need to use a workaround like
MyClass(new KeyValuePair<K,V>[]{...})
for now
Collection initializers are expressions, so they can be used where only expression are valid, such as a field initializer or LINQ query. This makes their existence very useful.
I also think the curly-bracketed { } kind of initialization, smells more like a fixed size collection, but it's just a syntax choice.
Related
I'm implementing a special case of an immutable dictionary, which for convenience implements IEnumerable<KeyValuePair<Foo, Bar>>. Operations that would ordinarily modify the dictionary should instead return a new instance.
So far so good. But when I try to write a fluent-style unit test for the class, I find that neither of the two fluent assertion libraries I've tried (Should and Fluent Assertions) supports the NotBeSameAs() operation on objects that implement IEnumerable -- not unless you first cast them to Object.
When I first ran into this, with Should, I assumed that it was just a hole in the framework, but when I saw that Fluent Assertions had the same hole, it made my think that (since I'm a relative newcomer to C#) I might be missing something conceptual about C# collections -- the author of Should implied as much when I filed an issue.
Obviously there are other ways to test this -- cast to Object and use NotBeSameAs(), just use Object.ReferenceEquals, whatever -- but if there's a good reason not to, I'd like to know what that is.
An IEnumerable<T> is not neccessarily a real object. IEnumerable<T> guarantees that you can enumerate through it's states. In simple cases you have a container class like a List<T> that is already materialized. Then you could compare both Lists' addresses. However, your IEnumerable<T> might also point to a sequence of commands, that will be executed once you enumerate. Basically a state machine:
public IEnumerable<int> GetInts()
{
yield return 10;
yield return 20;
yield return 30;
}
If you save this in a variable, you don't have a comparable object (everything is an object, so you do... but it's not meaningful):
var x = GetInts();
Your comparison only works for materialized ( .ToList() or .ToArray() ) IEnumerables, because those state machines have been evaluated and their results been saved to a collection. So yes, the library actually makes sense, if you know you have materialized IEnumerables, you will need to make this knowledge public by casting them to Object and calling the desired function on this object "manually".
In addition what Jon Skeet suggested take a look at this February 2013 MSDN article from Ted Neward:
.NET Collections, Part 2: Working with C5
Immutable (Guarded) Collections
With the rise of functional concepts
and programming styles, a lot of emphasis has swung to immutable data
and immutable objects, largely because immutable objects offer a lot
of benefits vis-à-vis concurrency and parallel programming, but also
because many developers find immutable objects easier to understand
and reason about. Corollary to that concept, then, follows the concept
of immutable collections—the idea that regardless of whether the
objects inside the collection are immutable, the collection itself is
fixed and unable to change (add or remove) the elements in the
collection. (Note: You can see a preview of immutable collections
released on NuGet in the MSDN Base Class Library (BCL) blog at
bit.ly/12AXD78.)
It describes the use of an open source library of collection goodness called C5.
Look at http://itu.dk/research/c5/
Random example:
ConfigurationElementCollection
.Net has tons of these little WhateverCollection classes that don't implement IEnumerable<T>, which means I can't use Linq to objects with them out of the box.
Even before Linq, you'd think they would have wanted to make use of generics (which were introduced all the way back in C# 2 I believe)
It seems I run across these annoying little collection types all the time.
Is there some technical reason?
The answer is in the question title: "named collections". Which is the way you had to make collections type-safe before generics became available. There are a lot of them in code that dates back to .NET 1.x, especially Winforms. There was no reasonable way to rewrite them using generics, that would have broken too much existing code.
So the named collection type is type safe but the rub is System.Collections.IEnumerator.Current, a property of type Object. You can Linqify these collections by using OfType() or Cast().
As Adam Houldsworth said in a comment already, you simply need to use the Cast<> method.
Example:
var a = new DogCollection();
var allFidos = a.Cast<Dog>().Where(d => d.Name == "Fido");
For the hardcore C# coders here, this might seem like a completely stupid question - however, I just came across a snippet of sample code in the AWS SDK forum and was completely sideswiped by it:
RunInstancesRequest runInstance = new RunInstancesRequest()
.WithMinCount(1)
.WithMaxCount(1)
.WithImageId(GetXMLElement("ami"))
.WithInstanceType("t1.micro");
This is very reminiscent of the old VB6 With ... End With syntax, which I have long lamented the absence of in C# - I've compiled it in my VS2008 project and it works a treat, saving numerous separate lines referencing these attributes individually.
I'm sure I've read articles in the past explaining why the VB6-style With-block wasn't in C#, so my question is: has this syntax always existed in the language, or is it a recent .NET change that has enabled it? Can we coat all object instantiations followed by attribute changes in the same sugar?
Isn't this better anyway?
RunInstancesRequest runInstance = new RunInstancesRequest
{
MinCount = 1,
MaxCount = 1,
ImageId = GetXMLEleemnt("ami"),
InstanceType = "t1.micro"
};
They implemented all those methods, each of which will also be returning the RunInstancesRequest object (aka, this). It's called a Fluent Interface
It is not syntactic sugar. Those methods just set a property and return the this object.
RunInstancesRequest runInstance = new RunInstancesRequest()
.WithMinCount(1)
.WithMaxCount(1)
.WithImageId(GetXMLElement("ami"))
.WithInstanceType("t1.micro");
==
RunInstancesRequest runInstance = new RunInstancesRequest().WithMinCount(1).WithMaxCount(1).WithImageId(GetXMLElement("ami")).WithInstanceType("t1.micro");
I don't know if that's considered syntactic sugar, or just pure formatting.
I think this technique is different than the With... syntax in VB. I think this is an example of chaining. Each method returns an instance of itself so you can chain the method calls.
See Method-Chaining in C#
The reason this syntax works for RunInstancesRequest is that each of the method calls that you are making return the original instance. The same concept can be applied to StringBuilder for the same reason, but not all classes have methods implemented in this way.
I would prefer having a constructor that takes all of those property values as arguments and sets them within the class.
It's always existed in C# and indeed in any C-style oo language (eh, most popular C-style language except C itself!)
It's unfair to compare it the the VB6 With...End With syntax, as it's much clearer what is going on in this case (about the only good thing I have to say about VB6's With...End With is at least it isn't as bad as Javascripts since it requires prior dots).
It is as people have said, a combination of the "fluent interface" and the fact that the . operator allows for whitespace before and after it, so we can put each item on newlines.
StringBuilder is the most commonly seen case in C#, as in:
new StringBuilder("This")
.Append(' ')
.Append("will")
.Append(' ')
.Append("work")
.Append('.');
A related, but not entirely the same, pattern is where you chain the methods of an immutable object that returns a different object of the same type as in:
DateTime yearAndADay = DateTime.UtcNow.AddYears(1).AddDays(1);
Yet another is returning modified IEnumerable<T> and IQueryable<T> objects from the LINQ related methods.
These though differ in returning different objects, rather than modifying a mutable object and returning that same object.
One of the main reasons that it is more common in C++ and Java than in C# is that C# has properties. This makes the most idiomatic means of assigning different properties a call to the related setter that is syntactically the same as setting a field. It does however block much of the most common use of the fluent interface idiom.
Personally, since the fluent interface idiom is not guaranteed (there's nothing to say MyClass.setProp(32) should return this or indeed, that it shouldn't return 32 which would also be useful in some cases), and since it is not as idiomatic in C#, I prefer to avoid it apart from with StringBuilder, which is such a well-know example that it almost exists as a separate StringBuilder idiom within C#
This syntax has always existed
Please refer to Extension Methods (C# Programming Guide)
I recently began to start using functions to make casting easier on my fingers for one instance I had something like this
((Dictionary<string,string>)value).Add(foo);
and converted it to a tiny little helper function so I can do this
ToDictionary(value).Add(foo);
Is this against the coding standards?
Also, what about simpler examples? For example in my scripting engine I've considered making things like this
((StringVariable)arg).Value="foo";
be
ToStringVar(arg).Value="foo";
I really just dislike how inorder to cast a value and instantly get a property from it you must enclose it in double parentheses. I have a feeling the last one is much worse than the first one though
Ignoring for a moment that you may actually need to do this casting - which I personally doubt - if you really just want to "save your fingers", you can use a using statement to shorten the name of your generic types.
At the top of your file, with all the other usings:
using ShorterType = Dictionary<string, Dictionary<int, List<Dictionary<OtherType, ThisIsRidiculous>>>>;
I don't think so. You've also done something nice in that it's a bit easier to read and see what's going on. Glib (in C) provides casting macros for their classes, so this isn't a new concept. Just don't go overkill trying to save your fingers.
In general, I would consider this to be code smell. In most situations where the type of casting you describe is necessary, you could get the same behavior by proper use of interfaces (Java) or virtual inheritance (C++) in addition to generics/templates. It is much safer to leave that responsibility of managing types to the compiler than attempting to manage it yourself.
Without additional context, it is hard to say about the example you have included. There are certainly situations in which the type of casting you describe is unavoidable; but they're the exception rather than the rule. For example, the type of casting (and the associated helper functions/macros) you're describing extremely common-place in generic C libraries.
In which situations I should use LINQ to Objects?
Obviously I can do everything without LINQ. So in which operations LINQ actually helps me to code shorter and/or more readable?
This question triggered by this
I find LINQ to Objects useful all over the place. The problem it solves is pretty general:
You have a collection of some data items
You want another collection, formed from the original collection, but after some sort of transformation or filtering. This might be sorting, projection, applying a predicate, grouping, etc.
That's a situation I come across pretty often. There are an awful lot of areas of programming which basically involve transforming one collection (or stream of data) into another. In those cases the code using LINQ is almost always shorter and more readable. I'd like to point out that LINQ shouldn't be regarded as being synonymous with query expressions - if only a single operator is required, the normal "dot notation" (using extension methods) can often be shorter and more readable.
One of the reasons I particularly like LINQ to Objects is that it is so general - whereas LINQ to SQL is likely to only get involved in your data layer (or pretty much become the data layer), LINQ to Objects is applicable in every layer, and in all kinds of applications.
Just as an example, here's a line in my MiniBench benchmarking framework, converting a TestSuite (which is basically a named collection of tests) into a ResultSuite (a named collection of results):
return new ResultSuite(name,
tests.Select(test => test.Run(input, expectedOutput)));
Then again if a ResultSuite needs to be scaled against some particular "standard" result:
return new ResultSuite(name,
results.Select(x => x.ScaleToStandard(standard, mode)));
It wouldn't be hard to write this code without LINQ, but LINQ just makes it clearer and lets you concentrate on the real "logic" instead of the details of iterating through loops and adding results to lists etc.
Even when LINQ itself isn't applicable, some of the features which were largely included for the sake of LINQ (e.g. implicitly typed local variables, lambda expressions, extension methods) can be very useful.
The answer practically everywhere comes to mind. A better question would be when not to use it.
LINQ is great for the "slippery slope". Think of what's involved in many common operations:
Where. Just write a foreach loop and an "if"
Select. Create an empty list of the target type, loop through the originals, convert each one and add it to the results.
OrderBy. Just add it to a list and call .Sort(). Or implement a bubble sort ;)
ThenBy (from order by PropertyA, then by PropertyB). Quite a bit harder. A custom comparer and Sort should do the trick.
GroupBy - create a Dictionary<key, List<value>> and loop through all items. If no key exists create it, then add items to the appropriate list.
In each of those cases, the procedural way takes more code than the LINQ way. In the case of "if" it's a couple of lines more; in the case of GroupBy or OrderBy/ThenBy it's a lot more.
Now take an all too common scenario of combining them together. You're suddenly looking at a 10-20 line method which could be solved with 3-4 lines in LINQ. And the LINQ version is guaranteed to be easier to read (once you are familiar with LINQ).
So when do you use LINQ? My answer: whenever you see "foreach" :)
LINQ is pretty useful in a few scenarios:
You want to use typed "business entities", instead of data tables, to more naturally access your data (and aren't already using something like NHibernate or LLBLGenPro)
You want to query non-relational data using a SQL like syntax (this is real handy when querying lists and such)
You don't like lots of inline SQL or stored procedures
LINQ comes in to play when you start doing complex filtering on complex data types. For example, if you're given a list of People objects and you need to gather a list of all the doctors within that list. With LINQ, you can compress the following code into a single LINQ statement:
(pseudo-code)
doctors = []
for person in people:
if person is doctor:
doctors.append(person)
(sorry, my C# is rusty, type checking syntax is probably incorrect, but you get the idea)
doctors = from person in people where person.type() == doctor select person;
Edit: After I answered I see a change to say "LINQ to Objects". Oh well.
If by LINQ we refer to all the new types in System.Linq, as well as new compiler features, then it'll have quite a bit of benefit -- it is effectively adding functional programming to these languages. ( Here's the progression I've seen a few times (although this is mainly C# -- VB is limited in the current version).
The obvious start is that anything related to list processing gets vastly easier. A lot of loops can just go away. What benefit do you get? You'll start programming more declaratively, which will lead to fewer bugs. Things start to "just work" when switching to this style. (The LINQ query syntax I don't find too useful, unless the queries are very complicated with lots of intermediate values. In these cases, the syntax will sort out all the issues you'd otherwise have to pass tuples around for.)
Next, language support (in C#, and in the next version of VB) for anonymous methods allows you to write a lot more constructs in a much shorter way. For instance, handling an async callback can be defined inside the method that initiates it. Using a closure here will result in you not having to bundle up state into an opaque object parameter and casting it out later on.
Being able to use higher order functions gets you thinking much more generically. So you'll start to see where you could simply pass in a lambda and solve things neater and cleaner. At this point, you'll realise that things only really work if you use generics. Sure, this is a 2.0 feature, but the usage is much more prevalent when you're passing functions around.
And around there, you get into the point of diminishing returns. The cost of declaring and using funcs and declaring all the generic type parameters (in C# and VB) is quite high. The compiler won't work it out for you, so you have to do it all manually. This adds a huge amount of overhead and friction, which limits how far you can go.
So, is this all "LINQ"? Depends on marketing, perhaps. The LINQ push made this style of programming much easier in C#, and all of LINQ is based on FP ideas.