Force function input parameters to be immutable? - c#

I've just spent the best part of 2 days trying to track down a bug, it turns out I was accidentally mutating the values that were provided as input to a function.
IEnumerable<DataLog>
FilterIIR(
IEnumerable<DataLog> buffer
) {
double notFilter = 1.0 - FilterStrength;
var filteredVal = buffer.FirstOrDefault()?.oilTemp ?? 0.0;
foreach (var item in buffer)
{
filteredVal = (item.oilTemp * notFilter) + (filteredVal * FilterStrength);
/* Mistake here!
item.oilTemp = filteredValue;
yield return item;
*/
// Correct version!
yield return new DataLog()
{
oilTemp = (float)filteredVal,
ambTemp = item.ambTemp,
oilCond = item.oilCond,
logTime = item.logTime
};
}
}
My programming language of preference is usually C# or C++ depending on what I think suits the requirements better (this is part of a larger program that suits C# better)...
Now in C++ I would have been able to guard against such a mistake by accepting constant iterators which prevent you from being able to modify the values as you retrieve them (though I might need to build a new container for the return value). I've done a little searching and can't find any simple way to do this in C#, does anyone know different?
I was thinking I could make an IReadOnlyEnumerable<T> class which takes an IEnumerable as a constructor, but then I realized that unless it makes a copy of the values as you retrieve them it won't actually have any effect, because the underlying value can still be modified.
Is there any way I might be able to protect against such errors in future? Some wrapper class, or even if it's a small code snippet at the top of each function I want to protect, anything would be fine really.
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.

The problem is here isn't really about the IEnumerable. IEnumerables are actually immutable. You can't add or remove things from them. What's mutable is your DataLog class.
Because DataLog is a reference type, item holds a reference to the original object, instead of a copy of the object. This, plus the fact that DataLog is mutable, allows you to mutate the parameters passed in.
So on a high level, you can either:
make a copy of DataLog, or;
make DataLog immutable
or both...
What you are doing now is "making a copy of DataLog". Another way of doing this is changing DataLog from a class to a struct. This way, you'll always create a copy of it when passing it to methods (unless you mark the parameter with ref). So be careful when using this method because it might silently break existing methods that assume a pass-by-reference semantic.
You can also make DataLog immutable. This means removing all the setters. Optionally, you can add methods named WithXXX that returns a copy of the object with only one property different. If you chose to do this, your FilterIIR would look like:
yield return item.WithOilTemp(filteredVal);
The only sort of reasonable approach I can think of at the moment that'll work is to define a ReadOnly version of every class I need, then have a non-readonly version that inherits and overloads the properties and adds functions to provide a mutable version of the same class.
You don't actually need to do this. Notice how List<T> implements IReadOnlyList<T>, even though List<T> is clearly mutable. You could write an interface called IReadOnlyDataLog. This interface would only have the getters of DataLog. Then, have FilterIIR accept a IEnumerable<IReadOnlyDataLog> and DataLog implement IReadOnlyDataLog. This way, you will not accidentally mutate the DataLog objects in FilterIIR.

Related

How to "change" immutable objects effectively?

Is there any clever design pattern or sort of generic method for "modification" of immutable objects?
Background:
Let's have set of different (not having common base) immutable objects, each having different set of {get; private set;} properties and one public constructor (accepting set of property values). These objects are and should remain immutable, but from time to time, in sort of special mode, their values are required to "change".
Such kind of modification means just creation of new object having the same values as the original one except the updated properties (like o = new c(o.a, updateB, updateC, o.d, ...)).
I can imagine calling the constructors in place, or defining an (e.g. extension) method returning new instance, accepting some parameters to identify the updated property/ies and modifying it/them via reflection, but everything seems to be very specific and not elegant to be used system wide for "any" immutable. I like the way linq handles e.g. the IEnumerables and the chain you can produce and completely re-transform the input collection. Any ideas?
One example of something similar I'm aware of is Roslyn, Microsoft's current C# compiler, which uses immutable data structures extensively. The syntax tree classes have convenience methods for every property for returning a new instance with just one property changed, e.g.
var cu = Syntax.CompilationUnit()
.AddMembers(
Syntax.NamespaceDeclaration(Syntax.IdentifierName("ACO"))
.AddMembers(
Syntax.ClassDeclaration("MainForm")
.AddBaseListTypes(Syntax.ParseTypeName("System.Windows.Forms.Form"))
.WithModifiers(Syntax.Token(SyntaxKind.PublicKeyword))
.AddMembers(
Syntax.PropertyDeclaration(Syntax.ParseTypeName("System.Windows.Forms.Timer"), "Ticker")
.AddAccessorListAccessors(
Syntax.AccessorDeclaration(SyntaxKind.GetAccessorDeclaration).WithSemicolonToken(Syntax.Token(SyntaxKind.SemicolonToken)),
Syntax.AccessorDeclaration(SyntaxKind.SetAccessorDeclaration).WithSemicolonToken(Syntax.Token(SyntaxKind.SemicolonToken))),
Syntax.MethodDeclaration(Syntax.ParseTypeName("void"), "Main")
.AddModifiers(Syntax.Token(SyntaxKind.PublicKeyword))
.AddAttributes(Syntax.AttributeDeclaration().AddAttributes(Syntax.Attribute(Syntax.IdentifierName("STAThread"))))
.WithBody(Syntax.Block())
)
)
);
or in your case:
o = o.WithB(updateB).WithC(updateC);
which I think reads quite nicely for the intended use case (updating only a few properties while keeping everything else the same). It especially beats the »always have to call the ctor« approach when there are many properties.
In Roslyn that's all auto-generated, I think. You can probably do something similar with T4, if warranted.

explicitly mark parameter as mutating in c#

I have a large amount of code that is dependent on a list of objects. the list is modified a lot while being passed around as a parameter to various methods.
Even though I understand the workings of this code, I feel uneasy letting such an easy opportunity to make a mistake exist. Is there a way to handle this situation in c# outside of a goofy comment or refactoring?
If you are passing a List<Something> around in your code, then it is "mutable" by default, and there is no way to signal this explicitly.
If this is a language background issue (Haskell?), then in C# you should looks things from a different perspective: if you wanted to pass around an immutable collection, you would need to use some different type (maybe an IEnumerable<Something>, even if it's not the same as a list); if you're passing around a List, instead, it can be modified by every method that receives it.
Maybe you can give that list a special type:
class MyCustomMutableList : List<int>
You could even not give it any base class to make sure that any usage site must use this special type in order to be able to access list data.
I would normally consider this a misuse of inheritance. If this is an implementation detail and does not leak out to consumers of your API it's probably good enough. Otherwise, create an IList<int> derived class through composition. R# has a feature to delegate all virtual methods to an instance field. That generates all that code.
You also could create a wrapper class that just exposes the required methods to perform the required mutations:
class DataCollector {
public void Add(int item) { ... }
}
Since all this object allows to do is mutation it is pretty clear that mutation is going on.

Best Practice List/Array/ReadOnlyCollection creation (and usage)

My code is littered with collections - not an unusual thing, I suppose. However, usage of the various collection types isn't obvious nor trivial. Generally, I'd like to use the type that's exposes the "best" API, and has the least syntactic noise. (See Best practice when returning an array of values, Using list arrays - Best practices for comparable questions). There are guidelines suggesting what types to use in an API, but these are impractical in normal (non-API) code.
For instance:
new ReadOnlyCollection<Tuple<string,int>>(
new List<Tuple<string,int>> {
Tuple.Create("abc",3),
Tuple.Create("def",37)
}
)
List's are a very common datastructure, but creating them in this fashion involves quite a bit of syntactic noise - and it can easily get even worse (e.g. dictionaries). As it turns out, many lists are never changed, or at least never extended. Of course ReadOnlyCollection introduces yet more syntactic noise, and it doesn't even convey quite what I mean; after all ReadOnlyCollection may wrap a mutating collection. Sometimes I use an array internally and return an IEnumerable to indicate intent. But most of these approaches have a very low signal-to-noise ratio; and that's absolutely critical to understanding code.
For the 99% of all code that is not a public API, it's not necessary to follow Framework Guidelines: however, I still want a comprehensible code and a type that communicates intent.
So, what's the best-practice way to deal with the bog-standard task of making small collections to pass around values? Should array be preferred over List where possible? Something else entirely? What's the best way - clean, readable, reasonably efficient - of passing around such small collections? In particular, code should be obvious to future maintainers that have not read this question and don't want to read swathes of API docs yet still understand what the intent is. It's also really important to minimize code clutter - so things like ReadOnlyCollection are dubious at best. Nothing wrong with wordy types for major API's with small surfaces, but not as a general practice inside a large codebase.
What's the best way to pass around lists of values without lots of code clutter (such as explicit type parameters) but that still communicates intent clearly?
Edit: clarified that this is about making short, clear code, not about public API's.
After hopefully understanding your question, i think you have to distinguish between what you create and manage within your class and what you make available to the outside world.
Within your class you can use whatever best fits your current task (pro/cons of List vs. Array vs. Dictionary vs. LinkedList vs. etc.). But this has maybe nothing to do about what you provide in your public properties or functions.
Within your public contract (properties and functions) you should give back the least type (or even better interface) that is needed. So just an IList, ICollection, IDictionary, IEnumerable of some public type. Thous leads that your consumer classes are just awaiting interfaces instead of concrete classes and so you can change the concrete implementation at a later stage without breaking your public contract (due to performance reasons use an List<> instead of a LinkedList<> or vice versa).
Update:
So, this isn't strictly speaking new; but this question convinced me to go ahead and announce an open source project I've had in the works for a while (still a work in progress, but there's some useful stuff in there), which includes an IArray<T> interface (and implementations, naturally) that I think captures exactly what you want here: an indexed, read-only, even covariant (bonus!) interface.
Some benefits:
It's not a concrete type like ReadOnlyCollection<T>, so it doesn't tie you down to a specific implementation.
It's not just a wrapper (like ReadOnlyCollection<T>), so it "really is" read-only.
It clears the way for some really nice extension methods. So far the Tao.NET library only has two (I know, weak), but more are on the way. And you can easily make your own, too—just derive from ArrayBase<T> (also in the library) and override the this[int] and Count properties and you're done.
If this sounds promising to you, feel free to check it out and let me know what you think.
It's not 100% clear to me where you're worried about this "syntactic noise": in your code or in calling code?
If you're tolerant of some "noise" in your own encapsulated code then I would suggest wrapping a T[] array and exposing an IList<T> which happens to be a ReadOnlyCollection<T>:
class ThingsCollection
{
ReadOnlyCollection<Thing> _things;
public ThingsCollection()
{
Thing[] things = CreateThings();
_things = Array.AsReadOnly(things);
}
public IList<Thing> Things
{
get { return _things; }
}
protected virtual Thing[] CreateThings()
{
// Whatever you want, obviously.
return new Thing[0];
}
}
Yes there is some noise on your end, but it's not bad. And the interface you expose is quite clean.
Another option is to make your own interface, something like IArray<T>, which wraps a T[] and provides a get-only indexer. Then expose that. This is basically as clean as exposing a T[] but without falsely conveying the idea that items can be set by index.
I do not pass around Listss if I can possibly help it. Generally I have something else that is managing the collection in question, which exposes the collection, for example:
public class SomeCollection
{
private List<SomeObject> m_Objects = new List<SomeObject>();
// ctor
public SomeCollection()
{
// Initialise list here, or wot-not/
} // eo ctor
public List<SomeObject> Objects { get { return m_Objects; } }
} // eo class SomeCollection
And so this would be the object passed around:
public void SomeFunction(SomeCollection _collection)
{
// work with _collection.Objects
} // eo SomeFunction
I like this approach, because:
1) I can populate my values in the ctor. They're there the momeny anyone news SomeCollection.
2) I can restrict access, if I want, to the underlying list. In my example I exposed it all, but you don't have to do this. You can make it read-only if you want, or validate additions to the list, prior to adding them.
3) It's clean. Far easier to read SomeCollection than List<SomeObject> everywhere.
4) If you suddenly realise that your collection of choice is inefficient, you can change the underlying collection type without having to go and change all the places where it got passed as a parameter (can you imagine the trouble you might have with, say, List<String>?)
I agree. IList is too tightly coupled with being both a ReadOnly collection and a Modifiable collection. IList should have inherited from an IReadOnlyList.
Casting back to IReadOnlyList wouldn't require a explicit cast. Casting forward would.
1.
Define your own class which implements IEnumerator, takes an IList in the new constructor, has a read only default item property taking an index, and does not include any properties/methods that could otherwise allow your list to me manipulated.
If you later want to allow modifying the ReadOnly wrapper like IReadOnlyCollection does, you can make another class which is a wrapper around your custom ReadOnly Collection and has the Insert/Add/Remove/RemoveAt/Clear/...implemented and cache those changes.
2.
Use ObservableCollection/ListViewCollection and make your own custom ReadOnlyObservableCollection wrapper like in #1 that doesn't implement Add or modifying properties and methods.
ObservableCollection can bind to ListViewCollection in such a way that changes to ListViewCollection do not get pushed back into ObservableCollection. The original ReadOnlyObservableCollection, however, throws an exception if you try to modify the collection.
If you need backwards/forwards compatibility, make two new classes inheriting from these. Then Implement IBindingList and handle/translate CollectionChanged Event (INotifyCollectionChanged event) to the appropriate IBindingList events.
Then you can bind it to older DataGridView and WinForm controls, as well as WPF/Silverlight controls.
Microsoft has created a Guidelines for Collections document which is a very informative list of DOs and DON'Ts that address most of your question.
It's a long list so here are the most relevant ones:
DO prefer collections over arrays.
DO NOT use ArrayList or List in public APIs. (public properties, public parameters and return types of public methods)
DO NOT use Hashtable or Dictionary in public APIs.
DO NOT use weakly typed collections in public APIs.
DO use the least-specialized type possible as a parameter type. Most members taking collections as parameters use the IEnumerable interface.
AVOID using ICollection or ICollection as a parameter just to access the Count property.
DO use ReadOnlyCollection, a subclass of ReadOnlyCollection, or in rare cases IEnumerable for properties or return values representing read-only collections.
As the last point states, you shouldn't avoid ReadOnlyCollection like you were suggesting. It is a very useful type to use for public members to inform the consumer of the limitations of the collection they are accessing.

Casting from interface to underlying type

Reviewing an earlier question on SO, I started thinking about the situation where a class exposes a value, such as a collection, as an interface implemented by the value's type. In the code example below, I am using a List, and exposing that list as IEnumerable.
Exposing the list through the IEnumerable interface defines the intent that the list only be enumerated over, not modified. However, since the instance can be re-cast back to a list, the list itself can of course be modified.
I also include in the sample a version of the method that prevents modification by copying the list item references to a new list each time the method is called, thereby preventing changes to the underlying list.
So my question is, should all code exposing a concrete type as an implemented interface do so by means of a copy operation? Would there be value in a language construct that explicitly indicates "I want to expose this value through an interface, and calling code should only be able to use this value through the interface"? What techniques do others use to prevent unintended side-effects like these when exposing concrete values through their interfaces.
Please note, I understand that the behavior illustrated is expected behavior. I am not claiming this behavior is wrong, just that it does allow use of functionality other than the expressed intent. Perhaps I am assigning too much significance to the interface - thinking of it as a functionality constraint. Thoughts?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TypeCastTest
{
class Program
{
static void Main(string[] args)
{
// Demonstrate casting situation
Automobile castAuto = new Automobile();
List<string> doorNamesCast = (List<string>)castAuto.GetDoorNamesUsingCast();
doorNamesCast.Add("Spare Tire");
// Would prefer this prints 4 names,
// actually prints 5 because IEnumerable<string>
// was cast back to List<string>, exposing the
// Add method of the underlying List object
// Since the list was cast to IEnumerable before being
// returned, the expressed intent is that calling code
// should only be able to enumerate over the collection,
// not modify it.
foreach (string doorName in castAuto.GetDoorNamesUsingCast())
{
Console.WriteLine(doorName);
}
Console.WriteLine();
// --------------------------------------
// Demonstrate casting defense
Automobile copyAuto = new Automobile();
List<string> doorNamesCopy = (List<string>)copyAuto.GetDoorNamesUsingCopy();
doorNamesCopy.Add("Spare Tire");
// This returns only 4 names,
// because the IEnumerable<string> that is
// returned is from a copied List<string>, so
// calling the Add method of the List object does
// not modify the underlying collection
foreach (string doorName in copyAuto.GetDoorNamesUsingCopy())
{
Console.WriteLine(doorName);
}
Console.ReadLine();
}
}
public class Automobile
{
private List<string> doors = new List<string>();
public Automobile()
{
doors.Add("Driver Front");
doors.Add("Passenger Front");
doors.Add("Driver Rear");
doors.Add("Passenger Rear");
}
public IEnumerable<string> GetDoorNamesUsingCopy()
{
return new List<string>(doors).AsEnumerable<string>();
}
public IEnumerable<string> GetDoorNamesUsingCast()
{
return doors.AsEnumerable<string>();
}
}
}
One way you can prevent this is by using AsReadOnly() to prevent any such nefariousness. I think the real answer is though, you should never be relying on anything other than the exposed interface/contract in terms of the return types, etc. Doing anything else defies encapsulation, prevents you from swapping out your implementations for others that don't use a List but instead just a T[], etc, etc.
Edit:
And down-casting like you mention is basically a violation of the Liskov Substition Principle, to get all technical and stuff.
In a situation like this, you could define your own collection class which implements IEnumerable<T>. Internally, your collection could keep a List<T> and then you could just return the enumerator of the underlying list:
public class MyList : IEnumerable<string>
{
private List<string> internalList;
// ...
IEnumerator<string> IEnumerable<string>.GetEnumerator()
{
return this.internalList.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.internalList.GetEnumerator();
}
}
An interface is a constraint on the implementation of a minimum set of things it must do (even if "doing" is no more than throwing a NotSupportedException; or even a NotImplementedException). It is not a constraint that either prevents the implementation from doing more, or on the calling code.
One thing I've learned working with .NET (and with some people who are quick to jump to a hack solution) is that if nothing else, reflection will often allow people to by pass your "protections."
Interfaces are not iron shackles of programming, they're a promise that your code makes to any other code saying "I can definitely do these things." If you "cheat" and cast the interface object into some other object because you, the programmer, know something that the program doesn't, then you're breaking that contract. The consequence is poorer maintainability and a reliance that no one ever mess up anything in that chain of execution, lest some other object get sent down that doesn't cast correctly.
Other tricks like making things readonly or hiding the actual list behind a wrapper are only stop-gaps. You could easily dig into the type using reflection to pull out the private list if you really wanted it. And I think there are attributes you can apply to types to prevent people from reflecting into them.
Likewise, readonly lists aren't really. I could probably figure out a way to modify the list itself. And I can almost certainly modify the items on the list. So a readonly isn't enough, nor is a copy or an array. You need a deep copy (clone) of the original list in order to actually protect the data, to some degree.
But the real question is, why are you fighting so hard against the contract that you wrote. Sometimes reflection hacking is a handy workaround when someone else's library is poorly designed and didn't expose something that it needs to (or a bug requires that you go digging to fix it.) But when you have control over the interface AND the consumer of the interface, there's no excuse to not make the publicly exposed interface as robust as you need it to be to get your work done.
Or in short: If you need a list, don't return IEnumerable, return a List. If you've got an IEnumerable but you actually needed a list, then its safer to make a new list from that IEnum and use that. There are very few reasons (and even fewer, maybe no, good reasons) to cast up to a type simply because "I know it's actually a list, so this will work."
Yeah, you can take steps to try and prevent people from doing that, but 1) the harder you fight people who insist on breaking the system, the harder they will try to break it and 2) they're only looking for more rope, and eventually they'll get enough to hang themselves.

Are there drawbacks to creating a class that encapsulates Generic Collection?

A part of my (C# 3.0 .NET 3.5) application requires several lists of strings to be maintained. I declare them, unsurprisingly, as List<string> and everything works, which is nice.
The strings in these Lists are actually (and always) Fund IDs. I'm wondering if it might be more intention-revealing to be more explicit, e.g.:
public class FundIdList : List<string> { }
... and this works as well. Are there any obvious drawbacks to this, either technically or philosophically?
I would start by going in the other direction: wrapping the string up into a class/struct called FundId. The advantage of doing so, I think, is greater than the generic list versus specialised list.
You code becomes type-safe: there is a lot less scope for you to pass a string representing something else into a method that expects a fund identifier.
You can constrain the strings that are valid in the constructor to FundId, i.e. enforce a maximum length, check that the code is in the expected format, &c.
You have a place to add methods/functions relating to that type. For example, if fund codes starting 'I' are internal funds you could add a property called IsInternal that formalises that.
As for FundIdList, the advantage to having such a class is similar to point 3 above for the FundId: you have a place to hook in methods/functions that operate on the list of FundIds (i.e. aggregate functions). Without such a place, you'll find that static helper methods start to crop up throughout the code or, in some static helper class.
List<> has no virtual or protected members - such classes should almost never be subclassed. Also, although it's possible you need the full functionality of List<string>, if you do - is there much point to making such a subclass?
Subclassing has a variety of downsides. If you declare your local type to be FundIdList, then you won't be able to assign to it by e.g. using linq and .ToList since your type is more specific. I've seen people decide they need extra functionality in such lists, and then add it to the subclassed list class. This is problematic, because the List implementation ignores such extra bits and may violate your constraints - e.g. if you demand uniqueness and declare a new Add method, anyone that simply (legally) upcasts to List<string> for instance by passing the list as a parameter typed as such will use the default list Add, not your new Add. You can only add functionality, never remove it - and there are no protected or virtual members that require subclassing to exploit.
So you can't really add any functionality you couldn't with an extension method, and your types aren't fully compatible anymore which limits what you can do with your list.
I prefer declaring a struct FundId containing a string and implementing whatever guarantees concerning that string you need there, and then working with a List<FundId> rather than a List<string>.
Finally, do you really mean List<>? I see many people use List<> for things for which IEnumerable<> or plain arrays are more suitable. Exposing your internal List in an api is particularly tricky since that means any API user can add/remove/change items. Even if you copy your list first, such a return value is still misleading, since people might expect to be able to add/remove/change items. And if you're not exposing the List in an API but merely using it for internal bookkeeping, then it's not nearly as interesting to declare and use a type that adds no functionality, only documentation.
Conclusion
Only use List<> for internals, and don't subclass it if you do. If you want some explicit type-safety, wrap string in a struct (not a class, since a struct is more efficient here and has better semantics: there's no confusion between a null FundId and a null string, and object equality and hashcode work as expected with structs but need to be manually specified for classes). Finally, expose IEnumerable<> if you need to support enumeration, or if you need indexing as well use the simple ReadOnlyCollection<> wrapper around your list rather than let the API client fiddle with internal bits. If you really need a mutatable list API, ObservableCollection<> at least lets you react to changes the client makes.
Personally I would leave it as a List<string>, or possibly create a FundId class that wraps a string and then store a List<FundId>.
The List<FundId> option would enforce type correct-ness and allow you to put some validation on FundIds.
Just leave it as a List<string>, you variable name is enough to tell others that it's storing FundIDs.
var fundIDList = new List<string>();
When do I need to inherit List<T>?
Inherit it if you have really special actions/operations to do to a fund id list.
public class FundIdList : List<string>
{
public void SpecialAction()
{
//can only do with a fund id list
//sorry I can't give an example :(
}
}
Unless I was going to want someone to do everything they could to List<string>, without any intervention on the part of FundIdList I would prefer to implement IList<string> (or an interface higher up the hierarchy if I didn't care about most of that interface's members) and delegate calls to a private List<string> when appropriate.
And if I did want someone to have that degree of control, I'd probably just given them a List<string> in the first place. Presumably you have something to make sure such strings actually are "Fund IDs", which you can't guarantee any more when you publicly use inheritance.
Actually, this sounds (and often does with List<T>) like a natural case for private inheritance. Alas, C# doesn't have private inheritance, so composition is the way to go.

Categories