I have a large amount of code that is dependent on a list of objects. the list is modified a lot while being passed around as a parameter to various methods.
Even though I understand the workings of this code, I feel uneasy letting such an easy opportunity to make a mistake exist. Is there a way to handle this situation in c# outside of a goofy comment or refactoring?
If you are passing a List<Something> around in your code, then it is "mutable" by default, and there is no way to signal this explicitly.
If this is a language background issue (Haskell?), then in C# you should looks things from a different perspective: if you wanted to pass around an immutable collection, you would need to use some different type (maybe an IEnumerable<Something>, even if it's not the same as a list); if you're passing around a List, instead, it can be modified by every method that receives it.
Maybe you can give that list a special type:
class MyCustomMutableList : List<int>
You could even not give it any base class to make sure that any usage site must use this special type in order to be able to access list data.
I would normally consider this a misuse of inheritance. If this is an implementation detail and does not leak out to consumers of your API it's probably good enough. Otherwise, create an IList<int> derived class through composition. R# has a feature to delegate all virtual methods to an instance field. That generates all that code.
You also could create a wrapper class that just exposes the required methods to perform the required mutations:
class DataCollector {
public void Add(int item) { ... }
}
Since all this object allows to do is mutation it is pretty clear that mutation is going on.
Related
I have an abstract class that is supposed to become the base of a huge hierarchy of classes. Among other things, it has a certain abstract method. During the execution, it will be called on all objects of this class constantly by a system that knows only about this abstract class and not it's children:
foreach( AbstractClass object in allTheObjects )
object.DoStuff();
However, a lot of it's children in fact don't override this method (which is empty in the base class); ones that use it and ones that don't a distributed among the class hierarchy. Will the C# take care of this empty method, or will I have to optimize it in some way?
P.S.: I know that premature optimization is evil, but it just got me really curious.
I wouldn't worry about that. In C#, method calls are really cheap. And even if you would optizmizie this, I doubt that you will see any differnce. See this post for reference.
I'd worry more about if you could avoid looping through everything, or what data strucutre allTheObjects is. I'd say there's much more potential for optimization there.
Also you might think about if you really need a big inheritance structure, or if you can achieve your goals with composition or interfaces.
You will also find more information here (interface methods vs. delegates vs. normal method calls)
However, a lot of it's children in fact will have this method empty
Maybe you need to declare this method as virtual instead of abstract? Thus you can provide default empty implementation.
I never seem to understand why we need delegates?
I know they are immutable reference types that hold reference of a method but why can't we just call the method directly, instead of calling it via a delegate?
Thanks
Simple answer: the code needing to perform the action doesn't know the method to call when it's written. You can only call the method directly if you know at compile-time which method to call, right? So if you want to abstract out the idea of "perform action X at the appropriate time" you need some representation of the action, so that the method calling the action doesn't need to know the exact implementation ahead of time.
For example:
Enumerable.Select in LINQ can't know the projection you want to use unless you tell it
The author of Button didn't know what you want the action to be when the user clicks on it
If a new Thread only ever did one thing, it would be pretty boring...
It may help you to think of delegates as being like single-method interfaces, but with a lot of language syntax to make them easy to use, and funky support for asynchronous execution and multicasting.
Of course you can call method directly on the object but consider following scenarios:
You want to call series of method by using single delegate without writing lot of method calls.
You want to implement event based system elegantly.
You want to call two methods same in signature but reside in different classes.
You want to pass method as a parameter.
You don't want to write lot of polymorphic code like in LINQ , you can provide lot of implementation to the Select method.
Because you may not have the method written yet, or you have designed your class in such a way that a user of it can decide what method (that user wrote) the user wants your class to execute.
They also make certain designs cleaner (for example, instead of a switch statement where you call different methods, you call the delegate passed in) and easier to understand and allow for extending your code without changing it (think OCP).
Delegates are also the basis of the eventing system - writing and registering event handlers without delegates would be much harder than it is with them.
See the different Action and Func delegates in Linq - it would hardly be as useful without them.
Having said that, no one forces you to use delegates.
Delegates supports Events
Delegates give your program a way to execute methods without having to know precisely what those methods are at compile time
Anything that can be done with delegates can be done without them, but delegates provide a much cleaner way of doing them. If one didn't have delegates, one would have to define an interface or abstract base class for every possible function signature containing a function Invoke(appropriate parameters), and define a class for each function which was to be callable by pseudo-delegates. That class would inherit the appropriate interface for the function's signature, would contain a reference to the class containing the function it was supposed to represent, and a method implementing Invoke(appropriate parameters) which would call the appropriate function in the class to which it holds a reference. If class Foo has two methods Foo1 and Foo2, both taking a single parameter, both of which can be called by pseudo-delegates, there would be two extra classes created, one for each method.
Without compiler support for this technique, the source code would have to be pretty heinous. If the compiler could auto-generate the proper nested classes, though, things could be pretty clean. Dispatch speed for pseudo-delegates would probably generally be slower than with conventional delegates, but if pseudo-delegates were an interface rather than an abstract base class, a class which only needs to make a pseudo-delegate for one method of a given signature could implement the appropriate pseudo-delegate interface itself; the class instance could then be passed to any code expecting a pseudo-delegate of that signature, avoiding any need to create an extra object. Further, while the number of classes one would need when using pseudo-delegates would be greater than when using "real" delegates, each pseudo-delegate would only need to hold a single object instance.
Think of C/C++ function pointers, and how you treat javascript event-handling functions as "data" and pass them around. In Delphi language also there is procedural type.
Behind the scenes, C# delegate and lambda expressions, and all those things are essentially the same idea: code as data. And this constitutes the very basis for functional programming.
You asked for an example of why you would pass a function as a parameter, I have a perfect one and thought it might help you understand, it is pretty academic but shows a use. Say you have a ListResults() method and getFromCache() method. Rather then have lots of checks if the cache is null etc. you can just pass any method to getCache and then only invoke it if the cache is empty inside the getFromCache method:
_cacher.GetFromCache(delegate { return ListResults(); }, "ListResults");
public IEnumerable<T> GetFromCache(MethodForCache item, string key, int minutesToCache = 5)
{
var cache = _cacheProvider.GetCachedItem(key);
//you could even have a UseCache bool here for central control
if (cache == null)
{
//you could put timings, logging etc. here instead of all over your code
cache = item.Invoke();
_cacheProvider.AddCachedItem(cache, key, minutesToCache);
}
return cache;
}
You can think of them as a construct similar with pointers to functions in C/C++. But they are more than that in C#. Details.
My code is littered with collections - not an unusual thing, I suppose. However, usage of the various collection types isn't obvious nor trivial. Generally, I'd like to use the type that's exposes the "best" API, and has the least syntactic noise. (See Best practice when returning an array of values, Using list arrays - Best practices for comparable questions). There are guidelines suggesting what types to use in an API, but these are impractical in normal (non-API) code.
For instance:
new ReadOnlyCollection<Tuple<string,int>>(
new List<Tuple<string,int>> {
Tuple.Create("abc",3),
Tuple.Create("def",37)
}
)
List's are a very common datastructure, but creating them in this fashion involves quite a bit of syntactic noise - and it can easily get even worse (e.g. dictionaries). As it turns out, many lists are never changed, or at least never extended. Of course ReadOnlyCollection introduces yet more syntactic noise, and it doesn't even convey quite what I mean; after all ReadOnlyCollection may wrap a mutating collection. Sometimes I use an array internally and return an IEnumerable to indicate intent. But most of these approaches have a very low signal-to-noise ratio; and that's absolutely critical to understanding code.
For the 99% of all code that is not a public API, it's not necessary to follow Framework Guidelines: however, I still want a comprehensible code and a type that communicates intent.
So, what's the best-practice way to deal with the bog-standard task of making small collections to pass around values? Should array be preferred over List where possible? Something else entirely? What's the best way - clean, readable, reasonably efficient - of passing around such small collections? In particular, code should be obvious to future maintainers that have not read this question and don't want to read swathes of API docs yet still understand what the intent is. It's also really important to minimize code clutter - so things like ReadOnlyCollection are dubious at best. Nothing wrong with wordy types for major API's with small surfaces, but not as a general practice inside a large codebase.
What's the best way to pass around lists of values without lots of code clutter (such as explicit type parameters) but that still communicates intent clearly?
Edit: clarified that this is about making short, clear code, not about public API's.
After hopefully understanding your question, i think you have to distinguish between what you create and manage within your class and what you make available to the outside world.
Within your class you can use whatever best fits your current task (pro/cons of List vs. Array vs. Dictionary vs. LinkedList vs. etc.). But this has maybe nothing to do about what you provide in your public properties or functions.
Within your public contract (properties and functions) you should give back the least type (or even better interface) that is needed. So just an IList, ICollection, IDictionary, IEnumerable of some public type. Thous leads that your consumer classes are just awaiting interfaces instead of concrete classes and so you can change the concrete implementation at a later stage without breaking your public contract (due to performance reasons use an List<> instead of a LinkedList<> or vice versa).
Update:
So, this isn't strictly speaking new; but this question convinced me to go ahead and announce an open source project I've had in the works for a while (still a work in progress, but there's some useful stuff in there), which includes an IArray<T> interface (and implementations, naturally) that I think captures exactly what you want here: an indexed, read-only, even covariant (bonus!) interface.
Some benefits:
It's not a concrete type like ReadOnlyCollection<T>, so it doesn't tie you down to a specific implementation.
It's not just a wrapper (like ReadOnlyCollection<T>), so it "really is" read-only.
It clears the way for some really nice extension methods. So far the Tao.NET library only has two (I know, weak), but more are on the way. And you can easily make your own, too—just derive from ArrayBase<T> (also in the library) and override the this[int] and Count properties and you're done.
If this sounds promising to you, feel free to check it out and let me know what you think.
It's not 100% clear to me where you're worried about this "syntactic noise": in your code or in calling code?
If you're tolerant of some "noise" in your own encapsulated code then I would suggest wrapping a T[] array and exposing an IList<T> which happens to be a ReadOnlyCollection<T>:
class ThingsCollection
{
ReadOnlyCollection<Thing> _things;
public ThingsCollection()
{
Thing[] things = CreateThings();
_things = Array.AsReadOnly(things);
}
public IList<Thing> Things
{
get { return _things; }
}
protected virtual Thing[] CreateThings()
{
// Whatever you want, obviously.
return new Thing[0];
}
}
Yes there is some noise on your end, but it's not bad. And the interface you expose is quite clean.
Another option is to make your own interface, something like IArray<T>, which wraps a T[] and provides a get-only indexer. Then expose that. This is basically as clean as exposing a T[] but without falsely conveying the idea that items can be set by index.
I do not pass around Listss if I can possibly help it. Generally I have something else that is managing the collection in question, which exposes the collection, for example:
public class SomeCollection
{
private List<SomeObject> m_Objects = new List<SomeObject>();
// ctor
public SomeCollection()
{
// Initialise list here, or wot-not/
} // eo ctor
public List<SomeObject> Objects { get { return m_Objects; } }
} // eo class SomeCollection
And so this would be the object passed around:
public void SomeFunction(SomeCollection _collection)
{
// work with _collection.Objects
} // eo SomeFunction
I like this approach, because:
1) I can populate my values in the ctor. They're there the momeny anyone news SomeCollection.
2) I can restrict access, if I want, to the underlying list. In my example I exposed it all, but you don't have to do this. You can make it read-only if you want, or validate additions to the list, prior to adding them.
3) It's clean. Far easier to read SomeCollection than List<SomeObject> everywhere.
4) If you suddenly realise that your collection of choice is inefficient, you can change the underlying collection type without having to go and change all the places where it got passed as a parameter (can you imagine the trouble you might have with, say, List<String>?)
I agree. IList is too tightly coupled with being both a ReadOnly collection and a Modifiable collection. IList should have inherited from an IReadOnlyList.
Casting back to IReadOnlyList wouldn't require a explicit cast. Casting forward would.
1.
Define your own class which implements IEnumerator, takes an IList in the new constructor, has a read only default item property taking an index, and does not include any properties/methods that could otherwise allow your list to me manipulated.
If you later want to allow modifying the ReadOnly wrapper like IReadOnlyCollection does, you can make another class which is a wrapper around your custom ReadOnly Collection and has the Insert/Add/Remove/RemoveAt/Clear/...implemented and cache those changes.
2.
Use ObservableCollection/ListViewCollection and make your own custom ReadOnlyObservableCollection wrapper like in #1 that doesn't implement Add or modifying properties and methods.
ObservableCollection can bind to ListViewCollection in such a way that changes to ListViewCollection do not get pushed back into ObservableCollection. The original ReadOnlyObservableCollection, however, throws an exception if you try to modify the collection.
If you need backwards/forwards compatibility, make two new classes inheriting from these. Then Implement IBindingList and handle/translate CollectionChanged Event (INotifyCollectionChanged event) to the appropriate IBindingList events.
Then you can bind it to older DataGridView and WinForm controls, as well as WPF/Silverlight controls.
Microsoft has created a Guidelines for Collections document which is a very informative list of DOs and DON'Ts that address most of your question.
It's a long list so here are the most relevant ones:
DO prefer collections over arrays.
DO NOT use ArrayList or List in public APIs. (public properties, public parameters and return types of public methods)
DO NOT use Hashtable or Dictionary in public APIs.
DO NOT use weakly typed collections in public APIs.
DO use the least-specialized type possible as a parameter type. Most members taking collections as parameters use the IEnumerable interface.
AVOID using ICollection or ICollection as a parameter just to access the Count property.
DO use ReadOnlyCollection, a subclass of ReadOnlyCollection, or in rare cases IEnumerable for properties or return values representing read-only collections.
As the last point states, you shouldn't avoid ReadOnlyCollection like you were suggesting. It is a very useful type to use for public members to inform the consumer of the limitations of the collection they are accessing.
A part of my (C# 3.0 .NET 3.5) application requires several lists of strings to be maintained. I declare them, unsurprisingly, as List<string> and everything works, which is nice.
The strings in these Lists are actually (and always) Fund IDs. I'm wondering if it might be more intention-revealing to be more explicit, e.g.:
public class FundIdList : List<string> { }
... and this works as well. Are there any obvious drawbacks to this, either technically or philosophically?
I would start by going in the other direction: wrapping the string up into a class/struct called FundId. The advantage of doing so, I think, is greater than the generic list versus specialised list.
You code becomes type-safe: there is a lot less scope for you to pass a string representing something else into a method that expects a fund identifier.
You can constrain the strings that are valid in the constructor to FundId, i.e. enforce a maximum length, check that the code is in the expected format, &c.
You have a place to add methods/functions relating to that type. For example, if fund codes starting 'I' are internal funds you could add a property called IsInternal that formalises that.
As for FundIdList, the advantage to having such a class is similar to point 3 above for the FundId: you have a place to hook in methods/functions that operate on the list of FundIds (i.e. aggregate functions). Without such a place, you'll find that static helper methods start to crop up throughout the code or, in some static helper class.
List<> has no virtual or protected members - such classes should almost never be subclassed. Also, although it's possible you need the full functionality of List<string>, if you do - is there much point to making such a subclass?
Subclassing has a variety of downsides. If you declare your local type to be FundIdList, then you won't be able to assign to it by e.g. using linq and .ToList since your type is more specific. I've seen people decide they need extra functionality in such lists, and then add it to the subclassed list class. This is problematic, because the List implementation ignores such extra bits and may violate your constraints - e.g. if you demand uniqueness and declare a new Add method, anyone that simply (legally) upcasts to List<string> for instance by passing the list as a parameter typed as such will use the default list Add, not your new Add. You can only add functionality, never remove it - and there are no protected or virtual members that require subclassing to exploit.
So you can't really add any functionality you couldn't with an extension method, and your types aren't fully compatible anymore which limits what you can do with your list.
I prefer declaring a struct FundId containing a string and implementing whatever guarantees concerning that string you need there, and then working with a List<FundId> rather than a List<string>.
Finally, do you really mean List<>? I see many people use List<> for things for which IEnumerable<> or plain arrays are more suitable. Exposing your internal List in an api is particularly tricky since that means any API user can add/remove/change items. Even if you copy your list first, such a return value is still misleading, since people might expect to be able to add/remove/change items. And if you're not exposing the List in an API but merely using it for internal bookkeeping, then it's not nearly as interesting to declare and use a type that adds no functionality, only documentation.
Conclusion
Only use List<> for internals, and don't subclass it if you do. If you want some explicit type-safety, wrap string in a struct (not a class, since a struct is more efficient here and has better semantics: there's no confusion between a null FundId and a null string, and object equality and hashcode work as expected with structs but need to be manually specified for classes). Finally, expose IEnumerable<> if you need to support enumeration, or if you need indexing as well use the simple ReadOnlyCollection<> wrapper around your list rather than let the API client fiddle with internal bits. If you really need a mutatable list API, ObservableCollection<> at least lets you react to changes the client makes.
Personally I would leave it as a List<string>, or possibly create a FundId class that wraps a string and then store a List<FundId>.
The List<FundId> option would enforce type correct-ness and allow you to put some validation on FundIds.
Just leave it as a List<string>, you variable name is enough to tell others that it's storing FundIDs.
var fundIDList = new List<string>();
When do I need to inherit List<T>?
Inherit it if you have really special actions/operations to do to a fund id list.
public class FundIdList : List<string>
{
public void SpecialAction()
{
//can only do with a fund id list
//sorry I can't give an example :(
}
}
Unless I was going to want someone to do everything they could to List<string>, without any intervention on the part of FundIdList I would prefer to implement IList<string> (or an interface higher up the hierarchy if I didn't care about most of that interface's members) and delegate calls to a private List<string> when appropriate.
And if I did want someone to have that degree of control, I'd probably just given them a List<string> in the first place. Presumably you have something to make sure such strings actually are "Fund IDs", which you can't guarantee any more when you publicly use inheritance.
Actually, this sounds (and often does with List<T>) like a natural case for private inheritance. Alas, C# doesn't have private inheritance, so composition is the way to go.
In a response to this question runefs suggested that "unless you have a very specific reason for using IList you should considere IEnumerable". Which do you use and why?
IEnumberable<T> is read-only, you have to reconstruct the collection to make changes to it. On the other hand, IList<T> is read-write. So if you expect a lot of changes to the collection, expose IList<T> but if it's safe to assume that you won't modify it, go with IEnumerable<T>.
Always use the most restrictive interface that provides the features you need, because that gives you the most flexibility to change the implementation later. So, if IEnumerable<T> is enough, then use that... if you need list-features, use IList<T>.
And preferably use the strongly typed generic versions.
The principle I follow is one I read a while back:
"Consume the simplest and expose the
most complex"
(I'm sure this is really common and I'm mis-quoting it, if anyone can knows the source can you leave a comment or edit please...)
(Edited to add - well, here I am a couple of weeks later and I've just ran into this quote in a completely different context, it looks like it started as the Robustness Principle or Postel's Law -
Be conservative in what you do; be liberal in what you accept from others.
The original definition is for network communication over the internet, but I'm sure I've seen it repurposed for defining class contracts in OO.)
Basically, if you're defining a method for external consumption then the parameters should be the most basic type that gives you the functionality you require - in the case of your example this may mean taking in an IEnumerable instead of an IList. This gives client code the most flexibility over what they can pass in. On the other hand if you are exposing a property for external consumption then do so with the most complex type (IList or ICollection instead of IEnumerable) since this gives the client the most flexibility in the way they use the object.
I'm finding the conversation between myself and DrJokepu in the comments fascinating, but I also appreciate that this isn't supposed to be a discussion forum so I'll edit my answer to further outline the reasons behind my choice to buck the trend and suggest that you expose it as an IList (well a List actually, as you'll see). Let's say this is the class we are talking about:
public class Example
{
private List<int> _internal = new List<int>();
public /*To be decided*/ External
{
get { return _internal; }
}
}
So first of all let's assume that we are exposing External as an IEnumerable. The reasons I have seen for doing this in the other answers and comments are currently:
IEnumerable is read-only
IEnumerable is the defacto standard
It reduces coupling so you can change the implementation
You don't expose implementation details
While IEnumerable only exposes readonly functionality that doesn't make it readonly. The real type you are returning is easily found out by reflection or simply pausing a debugger and looking, so it is trivial to cast it back to List<>, the same applies to the FileStream example in the comments below. If you are trying to protect the member then pretending it is something else isn't the way to do it.
I don't believe it is the defacto standard. I can't find any code in the .NET 3.5 library where Microsoft had the option to return an concrete collection and returned an interface instead. IEnumerable is more common in 3.5 thanks to LINQ, but that is because they can't expose a more derived type, not because they don't want to.
Now, reducing coupling I agree with - to a point - basically this argument seems to be that you are telling the client that while the object you return is a list I want you to treat is as an IEnumerable just in case you decide to change the internal code later. Ignoring the problems with this and the fact that the real type is exposed anyway it still leaves the question of "why stop at IEnumerable?" if you return it as an object type then you'll have complete freedom to change the implementation to anything! This means that you must have made a decision about how much functionality the client code requires so either you are writing the client code, or the decision is based around arbitrary metrics.
Finally, as previously discussed, you can assume that implementation details are always exposed in .NET, especially if you are publicly exposing the object.
So, after that long diatribe there is one question left - Why expose it as a List<>? Well, why not? You haven't gained anything by exposing it as a IEnumerable, so why artificially limit the client code's ability to work with the object? Imagine if Microsoft had decided that the Controls collection of a Winforms Control should appear as IEnumerable instead of ControlCollection - you'd no longer be able to pass it to methods that require an IList, work on items at an arbitrary index, or see if it contains a particular control unless you cast it. Microsoft wouldn't really have gained anything, and it would just inconvenience you.
When I only need to enumerate the children, I use IEnumerable. If I happen to need Count, I use ICollection. I try to avoid that, though, because it exposes implementation details.
IList<T> is indeed a very beefy interface. I prefer to expose Collection<T>-derived types. This is pretty much in line with what Jeffrey Richter suggests doing (don't have a book nearby, so can't specify page/chapter number): methods should accept the most common types as parameters and return the most "derived" types as return values.
If a read-only collection is exposed via a property, then the conventional way to do it is to expose it as ReadOnlyCollection (or a derived class of that) wrapping whatever you have there. It exposes full capabilities of IList to the client, and yet it makes it very clear that it is read-only.