I suppose this doesn't really matter, I'm just curious.
If the difference between dictionary and lookup is one is one-to-one and the other one-to-many, wouldn't dictionary by a more specific/derived version of the other?
A lookup is a collection of key/value pairs where the key can be repeated.
A dictionary is a collection of key/value pairs where the key cannot be repeated.
Why couldn't IDictionary implement ILookup?
I suspect this is mainly because the intention is different.
ILookup<T,U> is designed specifically to work with a collection of values. IDictionary<T,U> is intended to work with a single value (that could, of course, be a collection).
While you could, of course, have IDictionary<T,U> implementations implement this via returning an IEnumerable<U> with a single value, this would be confusing, especially if your "U" is a collection itself (ie: List<int>). In that case, would ILookup<T,U>.Item return an IEnumerable<List<int>>, or should it do some type of check for an IEnumerable<T> value type, and then "flatten" it? Either way, it'd look confusing, and add questionable value.
Interfaces IDictionary<T,U> and ILookup<T,U> both inherit IEnumerable. If an IDictionary<T,U> is cast to IEnumerable and GetEnumerator() is called on it, the resulting enumerator should return instances of KeyValuePair<T,U>. If an ILookup<T,U> is cast to IEnumerable and GetEnumerator() is called upon it, the resulting enumerator should return instances of IGrouping<T,U>. If the KeyValuePair<T,U> struct were modified to implement IGrouping<T,U> that might be workable, but hardly clean.
I suspect it's because the IDictionary'2 interface came out long before ILookup'2 did. Going back and modifying is unnecessary. Concrete implementations can use ILookup'2. I don't see what would be gained by modifying an interface people have been using for years.
Related
I am confused about which collection type that I should return from my public API methods and properties.
The collections that I have in mind are IList, ICollection and Collection.
Is returning one of these types always preferred over the others, or does it depend on the specific situation?
ICollection<T> is an interface that exposes collection semantics such as Add(), Remove(), and Count.
Collection<T> is a concrete implementation of the ICollection<T> interface.
IList<T> is essentially an ICollection<T> with random order-based access.
In this case you should decide whether or not your results require list semantics such as order based indexing (then use IList<T>) or whether you just need to return an unordered "bag" of results (then use ICollection<T>).
Generally you should return a type that is as general as possible, i.e. one that knows just enough of the returned data that the consumer needs to use. That way you have greater freedom to change the implementation of the API, without breaking the code that is using it.
Consider also the IEnumerable<T> interface as return type. If the result is only going to be iterated, the consumer doesn't need more than that.
The main difference between the IList<T> and ICollection<T> interfaces is that IList<T> allows you to access elements via an index. IList<T> describes array-like types. Elements in an ICollection<T> can only be accessed through enumeration. Both allow the insertion and deletion of elements.
If you only need to enumerate a collection, then IEnumerable<T> is to be preferred. It has two advantages over the others:
It disallows changes to the collection (but not to the elements, if they are of reference type).
It allows the largest possible variety of sources, including enumerations that are generated algorithmically and are not collections at all.
Allows lazy evaluation and can be queried with LINQ.
Collection<T> is a base class that is mainly useful to implementers of collections. If you expose it in interfaces (APIs), many useful collections not deriving from it will be excluded.
One disadvantage of IList<T> is that arrays implement it but do not allow you to add or remove items (i.e. you cannot change the array length). An exception will be thrown if you call IList<T>.Add(item) on an array. The situation is somewhat defused as IList<T> has a Boolean property IsReadOnly that you can check before attempting to do so. But in my eyes, this is still a design flaw in the library. Therefore, I use List<T> directly, when the possibility to add or remove items is required.
Which one should I choose? Let's consider just List<T> and IEnumerable<T> as examples for specialized / generalized types:
Method input parameter
IEnumerable<T> greatest flexibility for the caller. Restrictive for the implementer, read-only.
List<T> Restrictive for the caller. Gives flexibility to the implementer, can manipulate the collection.
Method ouput parameter or return value
IEnumerable<T> Restrictive for the caller, read-only. Greatest flexibility for the implementer. Allows to return about any collection or to implement an iterator (yield return).
List<T> Greatest flexibility for the caller, can manipulate the returned collection. Restrictive for the implementer.
Well, at this point you may be disappointed because I don't give you a simple answer. A statement like "always use this for input and that for output" would not be constructive. The reality is that it depends on use case. A method like void AddMissingEntries(TColl collection) will have to provide a collection type having an Add method or may even require a HashSet<T> for efficiency. A method void PrintItems(TColl collection) can happily live with an IEnumerable<T>.
IList<T> is the base interface for all generic lists. Since it is an ordered collection, the implementation can decide on the ordering, ranging from sorted order to insertion order. Moreover Ilist has Item property that allows methods to read and edit entries in the list based on their index.
This makes it possible to insert, remove a value into/from the list at a position index.
Also since IList<T> : ICollection<T>, all the methods from ICollection<T> are also available here for implementation.
ICollection<T> is the base interface for all generic collections. It defines size, enumerators and synchronization methods. You can add or remove an item into a collection but you cannot choose at which position it happens due to the absence of index property.
Collection<T> provides an implementation for IList<T>, IList and IReadOnlyList<T>.
If you use a narrower interface type such as ICollection<T> instead of IList<T>, you protect your code against breaking changes. If you use a wider interface type such as IList<T>, you are more in danger of breaking code changes.
Quoting from a source,
ICollection, ICollection<T> : You want to modify the collection or
you care about its size.
IList, IList<T>: You want to modify the collection and you care about the ordering and / or positioning of the elements in the collection.
Returning an interface type is more general, so (lacking further information on your specific use case) I'd lean towards that. If you want to expose indexing support, choose IList<T>, otherwise ICollection<T> will suffice. Finally, if you want to indicate that the returned types are read only, choose IEnumerable<T>.
And, in case you haven't read it before, Brad Abrams and Krzysztof Cwalina wrote a great book titled "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" (you can download a digest from here).
There are some subjects that come from this question:
interfaces versus classes
which specific class, from several alike classes, collection, list, array ?
Common classes versus subitem ("generics") collections
You may want to highlight that its an Object Oriented A.P.I.
interfaces versus classes
If you don't have much experience with interfaces, I recommend stick to classes.
I see a lot of times of developers jumping to interfaces, even if its not necesarilly.
And, end doing a poor interface design, instead of, a good class design,
which, by the way, can eventually, be migrated to a good interface design ...
You'll see a lot of interfaces in A.P.I., but, don't rush to it,
if you don't need it.
You will eventually learn how to apply interfaces, to your code.
which specific class, from several alike classes, collection, list, array ?
There are several classes in c# (dotnet) that can be interchanged. As already mention, if you need something from a more specific class, such as "CanBeSortedClass", then make it explicit in your A.P.I..
Does your A.P.I. user really needs to know, that your class can be sorted, or apply some format to the elements ? Then use "CanBeSortedClass" or "ElementsCanBePaintedClass",
otherwise use "GenericBrandClass".
Otherwise, use a more general class.
Common collection classes versus subitem ("generics") collections
You'll find that there are classes that contains others elements,
and you can specify that all elements should be of an specific type.
Generic Collections are those classes that you can use the same collection,
for several code applications, without having to create a new collection,
for each new subitem type, like this: Collection.
Does your A.P.I. user is going to need a very specific type, same for all elements ?
Use something like List<WashingtonApple> .
Does your A.P.I. user is going to need several related types ?
Expose List<Fruit> for your A.P.I., and use List<Orange> List<Banana>, List<Strawberry> internally, where Orange, Banana and Strawberry are descendants from Fruit .
Does your A.P.I. user is going to need a generic type collection ?
Use List, where all items are object (s).
Cheers.
Following the rules that a public APIs should never return a list, i'm blinding converting all code that returned lists, to return ICollection<T> instead:
public IList<T> CommaSeparate(String value) {...}
becomes
public ICollection<T> CommaSeparate(String value) {...}
And although an ICollection has a Count, there is no way to get items by that index.
And although an ICollection exposes an enumerator (allowing foreach), i see no guarantee that the order of enumeration starts at the "top" of the list, as opposed to the "bottom".
i could mitigate this by avoiding the use of ICollection, and instead use Collection:
public Collection<T> Commaseparate(String value) {...}
This allows the use of an Items[index] syntax.
Unfortunately, my internal implementation constructs an array; which i can be cast to return IList or ICollection, but not as a Collection.
Is there a ways to access items of a collection in order?
This begs the wider question: Does an ICollection even have an order?
Conceptually, imagine i want to parse a command line string. It is critical that the order of items be maintained.
Conceptually, i require a contract that indicates an "ordered" set of string tuples. In the case of an API contract, to indicate order, which of the following is correct:
IEnumerable<String> Grob(string s)
ICollection<String> Grob(string s)
IList<String> Grob(string s)
Collection<String> Grob(string s)
List<String> Grob(string s)
The ICollection<T> interface doesn't specify anything about an order. The objects will be in the order specified by the object returned. For example, if you return the Values collection of a SortedDictionary, the objects will be in the the order defined by the dictionary's comparer.
If you need the method to return, by contract, a type whose contract requires a certain order, then you should express that in the method's signature by returning the more specific type.
Regardless of the runtime type of the object returned, consider the behavior when the static reference is IList<T> or ICollection<T>: When you call GetEnumerator() (perhaps implicitly in a foreach loop), you're going to call the same method and get the same object regardless of the static type of the reference. It will therefore behave the same way regardless of the CommaSeparate() method's return type.
Additional thought:
As someone else pointed out, the FXCop rule warns against using List<T>, not IList<T>; the question you linked to is asking why FXCop doesn't recommend using IList<T> in place of List<T>, which is another matter. If I imagine that you are parsing a command-line string where order is important, I would stick with IList<T> if I were you.
ICollection does not have a guaranteed order, but the class that actually implements it may (or may not).
If you want to return an ordered collection, then return an IList<T> and don't get too hung up on FxCop's generally sound, but very generic, advice.
No, ICollection does not imply an order.
The ICollection instance has the "order" of whatever class that implements it. That is, referencing a List<T> as an ICollection will not alter its order at all.
Likewise, if you access an unordered collection as an ICollection, it will not impose an order on the unordered collection either.
So, to your question:
Does an ICollection even have an order?
The answer is: it depends solely on the class that implements it.
ICollection<T> may have an order, but the actual ordering depends on the class implementing it.
It does not have accesor for an item at given index. IList<T> specializes this interface to provide access by index.
An ICollection<T> is just an interface; whether it's ordered or not is entirely dependent up the implementation underlying it (which is supposed to be opaque).
If you want to be able to access it by index, you'd want to return things as an IList<T>, which is both IEnumerable<T> and ICollection<T>. One should bear in mind, though, that depending on the underlying implementation, that getting at an arbitrary item in the collection could require O(N/2) time on the average.
My inclination would be to avoid the 'collection' interfaces altogether and instead use a custom type representing the collection in terms of the problem domain and exposing the appropriate logical operations suitable for that type.
ICollection is just an interface—there is no implementation or explicit specification about ordering. That means if you return something that enumerates in an ordered manner, whatever is consuming your ICollection will do so in an ordered manner.
Order is only implied by the underlying, implementing, object. There is no specification in ICollection that says it should be ordered or not. Enumerating over a result multiple times will invoke the underlying object's enumerator, which is the only place that those rules would be set. An object doesn't change the way it is enumerated just because it inherits this interface. Had the interface specified that it is an ordered result, then you could safely rely on the ordered result of the implementing object.
It depends on the implementation of the instance. An ICollection that happens to be a List has an order, an ICollection that happens to be a Collection does not.
All ICollections implement IEnumerable, which returns the items one at a time, ordered or otherwise.
EDIT: In reply to your additional example about command line parsing in the question, I would argue that the appropriate return type depends on what you are doing with those arguments afterward, but IEnumerable is probably the right one.
My reasoning is that IList, ICollection, and their concrete implementations permit modification of the list returned from Grob, which you probably don't want. Since .NET doesn't have an Indexed Sequence interface, IEnumerable is the best bet to prevent your callers from doing something weird like trying to modify the parameter list that they get back.
If you expect that all present and future versions of your method will have no difficulty returning an object that will be able to quickly and easily retrieve the Nth item, use type IList<T> to return a reference to something that implements both IList<T> and non-generic ICollection. If you expect that some present or future versions might not be able to quickly and easily return the Nth item, but would be able to instantly report the number of items, use type ICollection<T> to return a reference something that implements ICollection<T> and non-generic ICollection. If you expect that present or future versions may have trouble even knowing how many items there are, return IEnumerable<T>. The question of sequencing is irrelevant; the ability to access the Nth thing implies that a defined sequence exists, but ICollection<T> says neither more nor less about sequencing than IEnumerable<T>.
I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.
I have a dictionary data structure that must be passed around using WCF. To do that I created a member property with get and set methods. I can basicly achieve the same functionality, with this property being either a:
IDictionary<keyType, valueType>
or a
IList<KeyValuePair<keyType, valueType>>
I can see no strong reason for choosing one over the other. One mild reaons I could think of is:
IDictionary - People reading the code will think that IDictionary makes more sense, since the data structure is a dictionary, but in terms of what is passed through WCF they are all the same.
Can anyone think of a reason to choose IList? If there is none I'll just go with IDictionary.
Design your interfaces based on use, not on implementation.
If the consumer of a class needs to iterate through the entire set, use IEnumerable. If they should be able to modify the result, and need index-based access, return IList. If they want specific items, and there is a single useful key value, return IDictionary.
Write your internal code this way, too :)
It depends on your consumers. I would cater for the most likely use case and make their API as simple as possible. Edge cases can always iterate the dictionary via the Values collection.
Don't make them think about it. If the the term dictionary is what they'd think about as the result of the operation and then the type with name is a very useful thing to use.
If the collection of keyValuePairs expects unique key, you can use dictionary.
If the same key can appear in more than one keyValuePair, use Ilist/ ienumerable.
I just realize that maybe I was mistaken all the time in exposing T[] to my views, instead of IEnumerable<T>.
Usually, for this kind of code:
foreach (var item in items) {}
item should be T[] or IEnumerable<T>?
Than, if I need to get the count of the items, would the Array.Count be faster over the IEnumerable<T>.Count()?
IEnumerable<T> is generally a better choice here, for the reasons listed elsewhere. However, I want to bring up one point about Count(). Quintin is incorrect when he says that the type itself implements Count(). It's actually implemented in Enumerable.Count() as an extension method, which means other types don't get to override it to provide more efficient implementations.
By default, Count() has to iterate over the whole sequence to count the items. However, it does know about ICollection<T> and ICollection, and is optimised for those cases. (In .NET 3.5 IIRC it's only optimised for ICollection<T>.) Now the array does implement that, so Enumerable.Count() defers to ICollection<T>.Count and avoids iterating over the whole sequence. It's still going to be slightly slower than calling Length directly, because Count() has to discover that it implements ICollection<T> to start with - but at least it's still O(1).
The same kind of thing is true for performance in general: the JITted code may well be somewhat tighter when iterating over an array rather than a general sequence. You'd basically be giving the JIT more information to play with, and even the C# compiler itself treats arrays differently for iteration (using the indexer directly).
However, these performance differences are going to be inconsequential for most applications - I'd definitely go with the more general interface until I had good reason not to.
It's partially inconsequential, but standard theory would dictate "Program against an interface, not an implementation". With the interface model you can change the actual datatype being passed without effecting the caller as long as it conforms to the same interface.
The contrast to that is that you might have a reason for exposing an array specifically and in which case would want to express that.
For your example I think IEnumerable<T> would be desirable. It's also worthy to note that for testing purposes using an interface could reduce the amount of headache you would incur if you had particular classes you would have to re-create all the time, collections aren't as bad generally, but having an interface contract you can mock easily is very nice.
Added for edit:
This is more inconsequential because the underlying datatype is what will implement the Count() method, for an array it should access the known length, I would not worry about any perceived overhead of the method.
See Jon Skeet's answer for an explanation of the Count() implementation.
T[] (one sized, zero based) also implements ICollection<T> and IList<T> with IEnumerable<T>.
Therefore if you want lesser coupling in your application IEnumerable<T> is preferable. Unless you want indexed access inside foreach.
Since Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces, I would use IEnumerable, unless you need to use these interfaces.
http://msdn.microsoft.com/en-us/library/system.array.aspx
Your gut feeling is correct, if all the view cares about, or should care about, is having an enumerable, that's all it should demand in its interfaces.
What is it logically (conceptually) from the outside?
If it's an array, then return the array. If the only point is to enumerate, then return IEnumerable. Otherwise IList or ICollection may be the way to go.
If you want to offer lots of functionality but not allow it to be modified, then perhaps use a List internally and return the ReadonlyList returned from it's .AsReadOnly() method.
Given that changing the code from an array to IEnumerable at a later date is easy, but changing it the other way is not, I would go with a IEnumerable until you know you need the small spead benfit of return an array.