Are there drawbacks to creating a class that encapsulates Generic Collection? - c#

A part of my (C# 3.0 .NET 3.5) application requires several lists of strings to be maintained. I declare them, unsurprisingly, as List<string> and everything works, which is nice.
The strings in these Lists are actually (and always) Fund IDs. I'm wondering if it might be more intention-revealing to be more explicit, e.g.:
public class FundIdList : List<string> { }
... and this works as well. Are there any obvious drawbacks to this, either technically or philosophically?

I would start by going in the other direction: wrapping the string up into a class/struct called FundId. The advantage of doing so, I think, is greater than the generic list versus specialised list.
You code becomes type-safe: there is a lot less scope for you to pass a string representing something else into a method that expects a fund identifier.
You can constrain the strings that are valid in the constructor to FundId, i.e. enforce a maximum length, check that the code is in the expected format, &c.
You have a place to add methods/functions relating to that type. For example, if fund codes starting 'I' are internal funds you could add a property called IsInternal that formalises that.
As for FundIdList, the advantage to having such a class is similar to point 3 above for the FundId: you have a place to hook in methods/functions that operate on the list of FundIds (i.e. aggregate functions). Without such a place, you'll find that static helper methods start to crop up throughout the code or, in some static helper class.

List<> has no virtual or protected members - such classes should almost never be subclassed. Also, although it's possible you need the full functionality of List<string>, if you do - is there much point to making such a subclass?
Subclassing has a variety of downsides. If you declare your local type to be FundIdList, then you won't be able to assign to it by e.g. using linq and .ToList since your type is more specific. I've seen people decide they need extra functionality in such lists, and then add it to the subclassed list class. This is problematic, because the List implementation ignores such extra bits and may violate your constraints - e.g. if you demand uniqueness and declare a new Add method, anyone that simply (legally) upcasts to List<string> for instance by passing the list as a parameter typed as such will use the default list Add, not your new Add. You can only add functionality, never remove it - and there are no protected or virtual members that require subclassing to exploit.
So you can't really add any functionality you couldn't with an extension method, and your types aren't fully compatible anymore which limits what you can do with your list.
I prefer declaring a struct FundId containing a string and implementing whatever guarantees concerning that string you need there, and then working with a List<FundId> rather than a List<string>.
Finally, do you really mean List<>? I see many people use List<> for things for which IEnumerable<> or plain arrays are more suitable. Exposing your internal List in an api is particularly tricky since that means any API user can add/remove/change items. Even if you copy your list first, such a return value is still misleading, since people might expect to be able to add/remove/change items. And if you're not exposing the List in an API but merely using it for internal bookkeeping, then it's not nearly as interesting to declare and use a type that adds no functionality, only documentation.
Conclusion
Only use List<> for internals, and don't subclass it if you do. If you want some explicit type-safety, wrap string in a struct (not a class, since a struct is more efficient here and has better semantics: there's no confusion between a null FundId and a null string, and object equality and hashcode work as expected with structs but need to be manually specified for classes). Finally, expose IEnumerable<> if you need to support enumeration, or if you need indexing as well use the simple ReadOnlyCollection<> wrapper around your list rather than let the API client fiddle with internal bits. If you really need a mutatable list API, ObservableCollection<> at least lets you react to changes the client makes.

Personally I would leave it as a List<string>, or possibly create a FundId class that wraps a string and then store a List<FundId>.
The List<FundId> option would enforce type correct-ness and allow you to put some validation on FundIds.

Just leave it as a List<string>, you variable name is enough to tell others that it's storing FundIDs.
var fundIDList = new List<string>();
When do I need to inherit List<T>?
Inherit it if you have really special actions/operations to do to a fund id list.
public class FundIdList : List<string>
{
public void SpecialAction()
{
//can only do with a fund id list
//sorry I can't give an example :(
}
}

Unless I was going to want someone to do everything they could to List<string>, without any intervention on the part of FundIdList I would prefer to implement IList<string> (or an interface higher up the hierarchy if I didn't care about most of that interface's members) and delegate calls to a private List<string> when appropriate.
And if I did want someone to have that degree of control, I'd probably just given them a List<string> in the first place. Presumably you have something to make sure such strings actually are "Fund IDs", which you can't guarantee any more when you publicly use inheritance.
Actually, this sounds (and often does with List<T>) like a natural case for private inheritance. Alas, C# doesn't have private inheritance, so composition is the way to go.

Related

Force function input parameters to be immutable?

I've just spent the best part of 2 days trying to track down a bug, it turns out I was accidentally mutating the values that were provided as input to a function.
IEnumerable<DataLog>
FilterIIR(
IEnumerable<DataLog> buffer
) {
double notFilter = 1.0 - FilterStrength;
var filteredVal = buffer.FirstOrDefault()?.oilTemp ?? 0.0;
foreach (var item in buffer)
{
filteredVal = (item.oilTemp * notFilter) + (filteredVal * FilterStrength);
/* Mistake here!
item.oilTemp = filteredValue;
yield return item;
*/
// Correct version!
yield return new DataLog()
{
oilTemp = (float)filteredVal,
ambTemp = item.ambTemp,
oilCond = item.oilCond,
logTime = item.logTime
};
}
}
My programming language of preference is usually C# or C++ depending on what I think suits the requirements better (this is part of a larger program that suits C# better)...
Now in C++ I would have been able to guard against such a mistake by accepting constant iterators which prevent you from being able to modify the values as you retrieve them (though I might need to build a new container for the return value). I've done a little searching and can't find any simple way to do this in C#, does anyone know different?
I was thinking I could make an IReadOnlyEnumerable<T> class which takes an IEnumerable as a constructor, but then I realized that unless it makes a copy of the values as you retrieve them it won't actually have any effect, because the underlying value can still be modified.
Is there any way I might be able to protect against such errors in future? Some wrapper class, or even if it's a small code snippet at the top of each function I want to protect, anything would be fine really.
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.
The problem is here isn't really about the IEnumerable. IEnumerables are actually immutable. You can't add or remove things from them. What's mutable is your DataLog class.
Because DataLog is a reference type, item holds a reference to the original object, instead of a copy of the object. This, plus the fact that DataLog is mutable, allows you to mutate the parameters passed in.
So on a high level, you can either:
make a copy of DataLog, or;
make DataLog immutable
or both...
What you are doing now is "making a copy of DataLog". Another way of doing this is changing DataLog from a class to a struct. This way, you'll always create a copy of it when passing it to methods (unless you mark the parameter with ref). So be careful when using this method because it might silently break existing methods that assume a pass-by-reference semantic.
You can also make DataLog immutable. This means removing all the setters. Optionally, you can add methods named WithXXX that returns a copy of the object with only one property different. If you chose to do this, your FilterIIR would look like:
yield return item.WithOilTemp(filteredVal);
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.
You don't actually need to do this. Notice how List<T> implements IReadOnlyList<T>, even though List<T> is clearly mutable. You could write an interface called IReadOnlyDataLog. This interface would only have the getters of DataLog. Then, have FilterIIR accept a IEnumerable<IReadOnlyDataLog> and DataLog implement IReadOnlyDataLog. This way, you will not accidentally mutate the DataLog objects in FilterIIR.

explicitly mark parameter as mutating in c#

I have a large amount of code that is dependent on a list of objects. the list is modified a lot while being passed around as a parameter to various methods.
Even though I understand the workings of this code, I feel uneasy letting such an easy opportunity to make a mistake exist. Is there a way to handle this situation in c# outside of a goofy comment or refactoring?
If you are passing a List<Something> around in your code, then it is "mutable" by default, and there is no way to signal this explicitly.
If this is a language background issue (Haskell?), then in C# you should looks things from a different perspective: if you wanted to pass around an immutable collection, you would need to use some different type (maybe an IEnumerable<Something>, even if it's not the same as a list); if you're passing around a List, instead, it can be modified by every method that receives it.
Maybe you can give that list a special type:
class MyCustomMutableList : List<int>
You could even not give it any base class to make sure that any usage site must use this special type in order to be able to access list data.
I would normally consider this a misuse of inheritance. If this is an implementation detail and does not leak out to consumers of your API it's probably good enough. Otherwise, create an IList<int> derived class through composition. R# has a feature to delegate all virtual methods to an instance field. That generates all that code.
You also could create a wrapper class that just exposes the required methods to perform the required mutations:
class DataCollector {
public void Add(int item) { ... }
}
Since all this object allows to do is mutation it is pretty clear that mutation is going on.

'Don't expose generic list', why to use collection<T> instead of list<T> in method parameter

I am using FxCop and it shows warning for "Don't expose generic list" which suggests use Collection<T> instead of List<T>. The reason why it is preferred, I know all that stuff, as mentioned in this SO post and MSDN and many more articles I have been through.
But my question is this, I am having few methods which does so much heavy calculation and methods accepts parameters of List<T> which is supposed to be faster and good in terms of performance. But FxCop issues warning for this as well as. So one option is that I should declare the parameter as Collection<T>, then use ToList() inside the method and then use it.
So which one is optimized?
"Suppress the warning for this case" OR "use Collection<T> in parameter and then use ToList() inside the method itself".
The code analysis/FxCop rules have been written to support framework creators (Microsoft creates a lot of frameworks). A framework is consumed by external parties and you should be careful when you design the public interface. Provided that you are not writing a framework to be consumed by external parties you can simply ignore rules that doesn't provide value to you.
However, one of the reasons that this rule exists is that exposing collections on a class is somewhat difficult. Often the elements in the collection are owned by the containing class and in that case you violate encapsulation if you allow clients to modify the collection used to store the aggregated items. By returning List<T> you allow the clients to modify the collection in many different ways. But often you want to keep track of the items in the collection. E.g. adding a new element might require some additional bookkeeping in the containing class etc. You lose this kind of control when you return a List<T> unless of course you make a copy when you return it (but then the client should understand that they only get a copy of collection and modifications will be ignored).
All in all you can probably improve your class design by avoiding exposing classes like List<T> and being more explicit about how aggregated elements can be added, modified and removed. But if you are in a hurry and just want to crank out some code then using List<T> may be exactly what you need to get the job done.
Don't bother using generic lists in public properties as long as you are not coding a framework somebody else want's to extend in the near future.
I suggest to suppress the warning. You can refactor your classes later if requirements change.
IMHO your interpretation of "Don't expose generic list' which suggests use collection instead of list". Is invalid.
The critical difference between collection and list is that the elements in list are ordered. Some methods may require that passed elements have order. Then we must use in parameter a list.
The key to understand delivered warning is that you should use instead of concrete class List<T> a interface IList<T>.
As the method operate on the list it is not so important what kind of list it is. The key factor is that it is a list.
Concluding the method parameters should be abstract as possible.
You should use the type that is most appropriate for your purposes (and suppress the warning if appropriate). If you're passing a bunch of items, and order and uniqueness don't matter, use a collection. If you're passing an ordered collection of items, use a list. If you're passing data such that every item is unique but order doesn't matter, use a set. Use the type that has the semantic meaning appropriate for the exchange. In a few cases where the semantics and the methods that you need don't necessarily align (suppose you need AddRange), make an exception, or use the conversion methods.

Why refactor argument of List<Term> to IEnumerable<Term>?

I have a method that looks like this:
public void UpdateTermInfo(List<Term> termInfoList)
{
foreach (Term termInfo in termInfoList)
{
UpdateTermInfo(termInfo);
}
m_xdoc.Save(FileName.FullName);
}
Resharper advises me to change the method signature to IEnumerable<Term> instead of List<Term>. What is the benefit of doing this?
The other answers point out that by choosing a "larger" type you permit a broader set of callers to call you. Which is a good enough reason in itself to make this change. However, there are other reasons. I would recommend that you make this change because when I see a method that takes a list or an array, the first thing I think is "what if that method tries to change an item in my list/array?"
You want the contents of a bucket, but you are requiring not just the bucket but also the ability to change its contents. Why would you require that if you're not going to use that ability? When you say "this method cannot take any old sequence; it has to take a mutable list that is indexed by integers" I think that you're making that requirement on the caller because you're going to take advantage of that power.
If "I'm planning on messing up your data structure" is not what you intend to communicate to the caller of the method then don't communicate that. A method that takes a sequence communicates "The most I'm going to do is read from this sequence in order".
Simply put, accepting an enumerable allows your function to be compatible with a broader scope of input arguments, such as arrays and LINQ queries.
To expound on accepting LINQ queries, one could do:
UpdateTermInfo(myTermList.Where(x => somefilter));
Additionally, specifying an interface rather than a concrete class allows others to provide their own implementation of that interface. In this way, you are being "subscriptive" rather than "proscriptive." (Yes, I did just make up a word.)
In general (with many exceptions relating to what sort of abilities you want to reserve for potential later modifications), it is a best-practice to implement functions using arguments that are the most general that they can be. This gives maximum flexibility to the consumer of your function.
As a result, if you are dead-set on using a list for this function (perhaps because at some later date you expect you might want to use properties such as Count or the index operator), I would strongly urge you to consider using IList<Term> instead of List<Term> for the reasons mentioned above.
List implements IEnumerable, using it would makes things more flexible. If an instance came along where you didn't want to use a List and wanted to use a different collection object it would cast from IEnumerable with ease.
For instance IEnumerable allows you to use Arrays and many others as opposed to always using a List.
Inumerable is simply a collection of items, dissimilar to a List, where you can add, remove, sort, use For Each, Count etc.
The main idea behind that refactor is that you make the method more general. You don't say what data structure you want, only what you need from it: that you can iterate through its elements.
So later, when you decide that O(n) search is not good enough for you, you only have to change one line and move along.
If you use List then you are confining yourself to only use a concrete implementation of List where as with IEnumerable you can pass in Arrays, Lists, Collections as they all implement that interface.

Best Practice List/Array/ReadOnlyCollection creation (and usage)

My code is littered with collections - not an unusual thing, I suppose. However, usage of the various collection types isn't obvious nor trivial. Generally, I'd like to use the type that's exposes the "best" API, and has the least syntactic noise. (See Best practice when returning an array of values, Using list arrays - Best practices for comparable questions). There are guidelines suggesting what types to use in an API, but these are impractical in normal (non-API) code.
For instance:
new ReadOnlyCollection<Tuple<string,int>>(
new List<Tuple<string,int>> {
Tuple.Create("abc",3),
Tuple.Create("def",37)
}
)
List's are a very common datastructure, but creating them in this fashion involves quite a bit of syntactic noise - and it can easily get even worse (e.g. dictionaries). As it turns out, many lists are never changed, or at least never extended. Of course ReadOnlyCollection introduces yet more syntactic noise, and it doesn't even convey quite what I mean; after all ReadOnlyCollection may wrap a mutating collection. Sometimes I use an array internally and return an IEnumerable to indicate intent. But most of these approaches have a very low signal-to-noise ratio; and that's absolutely critical to understanding code.
For the 99% of all code that is not a public API, it's not necessary to follow Framework Guidelines: however, I still want a comprehensible code and a type that communicates intent.
So, what's the best-practice way to deal with the bog-standard task of making small collections to pass around values? Should array be preferred over List where possible? Something else entirely? What's the best way - clean, readable, reasonably efficient - of passing around such small collections? In particular, code should be obvious to future maintainers that have not read this question and don't want to read swathes of API docs yet still understand what the intent is. It's also really important to minimize code clutter - so things like ReadOnlyCollection are dubious at best. Nothing wrong with wordy types for major API's with small surfaces, but not as a general practice inside a large codebase.
What's the best way to pass around lists of values without lots of code clutter (such as explicit type parameters) but that still communicates intent clearly?
Edit: clarified that this is about making short, clear code, not about public API's.
After hopefully understanding your question, i think you have to distinguish between what you create and manage within your class and what you make available to the outside world.
Within your class you can use whatever best fits your current task (pro/cons of List vs. Array vs. Dictionary vs. LinkedList vs. etc.). But this has maybe nothing to do about what you provide in your public properties or functions.
Within your public contract (properties and functions) you should give back the least type (or even better interface) that is needed. So just an IList, ICollection, IDictionary, IEnumerable of some public type. Thous leads that your consumer classes are just awaiting interfaces instead of concrete classes and so you can change the concrete implementation at a later stage without breaking your public contract (due to performance reasons use an List<> instead of a LinkedList<> or vice versa).
Update:
So, this isn't strictly speaking new; but this question convinced me to go ahead and announce an open source project I've had in the works for a while (still a work in progress, but there's some useful stuff in there), which includes an IArray<T> interface (and implementations, naturally) that I think captures exactly what you want here: an indexed, read-only, even covariant (bonus!) interface.
Some benefits:
It's not a concrete type like ReadOnlyCollection<T>, so it doesn't tie you down to a specific implementation.
It's not just a wrapper (like ReadOnlyCollection<T>), so it "really is" read-only.
It clears the way for some really nice extension methods. So far the Tao.NET library only has two (I know, weak), but more are on the way. And you can easily make your own, too—just derive from ArrayBase<T> (also in the library) and override the this[int] and Count properties and you're done.
If this sounds promising to you, feel free to check it out and let me know what you think.
It's not 100% clear to me where you're worried about this "syntactic noise": in your code or in calling code?
If you're tolerant of some "noise" in your own encapsulated code then I would suggest wrapping a T[] array and exposing an IList<T> which happens to be a ReadOnlyCollection<T>:
class ThingsCollection
{
ReadOnlyCollection<Thing> _things;
public ThingsCollection()
{
Thing[] things = CreateThings();
_things = Array.AsReadOnly(things);
}
public IList<Thing> Things
{
get { return _things; }
}
protected virtual Thing[] CreateThings()
{
// Whatever you want, obviously.
return new Thing[0];
}
}
Yes there is some noise on your end, but it's not bad. And the interface you expose is quite clean.
Another option is to make your own interface, something like IArray<T>, which wraps a T[] and provides a get-only indexer. Then expose that. This is basically as clean as exposing a T[] but without falsely conveying the idea that items can be set by index.
I do not pass around Listss if I can possibly help it. Generally I have something else that is managing the collection in question, which exposes the collection, for example:
public class SomeCollection
{
private List<SomeObject> m_Objects = new List<SomeObject>();
// ctor
public SomeCollection()
{
// Initialise list here, or wot-not/
} // eo ctor
public List<SomeObject> Objects { get { return m_Objects; } }
} // eo class SomeCollection
And so this would be the object passed around:
public void SomeFunction(SomeCollection _collection)
{
// work with _collection.Objects
} // eo SomeFunction
I like this approach, because:
1) I can populate my values in the ctor. They're there the momeny anyone news SomeCollection.
2) I can restrict access, if I want, to the underlying list. In my example I exposed it all, but you don't have to do this. You can make it read-only if you want, or validate additions to the list, prior to adding them.
3) It's clean. Far easier to read SomeCollection than List<SomeObject> everywhere.
4) If you suddenly realise that your collection of choice is inefficient, you can change the underlying collection type without having to go and change all the places where it got passed as a parameter (can you imagine the trouble you might have with, say, List<String>?)
I agree. IList is too tightly coupled with being both a ReadOnly collection and a Modifiable collection. IList should have inherited from an IReadOnlyList.
Casting back to IReadOnlyList wouldn't require a explicit cast. Casting forward would.
1.
Define your own class which implements IEnumerator, takes an IList in the new constructor, has a read only default item property taking an index, and does not include any properties/methods that could otherwise allow your list to me manipulated.
If you later want to allow modifying the ReadOnly wrapper like IReadOnlyCollection does, you can make another class which is a wrapper around your custom ReadOnly Collection and has the Insert/Add/Remove/RemoveAt/Clear/...implemented and cache those changes.
2.
Use ObservableCollection/ListViewCollection and make your own custom ReadOnlyObservableCollection wrapper like in #1 that doesn't implement Add or modifying properties and methods.
ObservableCollection can bind to ListViewCollection in such a way that changes to ListViewCollection do not get pushed back into ObservableCollection. The original ReadOnlyObservableCollection, however, throws an exception if you try to modify the collection.
If you need backwards/forwards compatibility, make two new classes inheriting from these. Then Implement IBindingList and handle/translate CollectionChanged Event (INotifyCollectionChanged event) to the appropriate IBindingList events.
Then you can bind it to older DataGridView and WinForm controls, as well as WPF/Silverlight controls.
Microsoft has created a Guidelines for Collections document which is a very informative list of DOs and DON'Ts that address most of your question.
It's a long list so here are the most relevant ones:
DO prefer collections over arrays.
DO NOT use ArrayList or List in public APIs. (public properties, public parameters and return types of public methods)
DO NOT use Hashtable or Dictionary in public APIs.
DO NOT use weakly typed collections in public APIs.
DO use the least-specialized type possible as a parameter type. Most members taking collections as parameters use the IEnumerable interface.
AVOID using ICollection or ICollection as a parameter just to access the Count property.
DO use ReadOnlyCollection, a subclass of ReadOnlyCollection, or in rare cases IEnumerable for properties or return values representing read-only collections.
As the last point states, you shouldn't avoid ReadOnlyCollection like you were suggesting. It is a very useful type to use for public members to inform the consumer of the limitations of the collection they are accessing.

Categories