Why aren't there genuine immutable collections in C#? [closed] - c#

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am currently learning C# and I have a situation where I have a class that contains a ISet. I don't wish clients to modify this set directly, and most clients only Add and Remove, and I provide accessors through my class to do this.
However, I have one client that wishes to know more about this set and its contents. I don't really want to muddy the wrapper class itself with lots of methods for this one client, so I would prefer to be able to return the set itself in a immutable way.
I found I can't - well, not really. The only options I seem to have are:
Return an IEnumerable (No: restrictive functionality);
ReadOnlyCollection (No: It's a LIST);
Return a copy (No: Bad form IMHO, allows clients to modify the returned collection perhaps unaware that it's not going to change the real object, plus it has performance overhead);
Implement my own ReadOnlySet (No: Would need to derive from ISet and thus meaning I need to implement mutators, probably firing exceptions, I would rather compile time errors - not runtime).
Am I missing something? Am I being unreasonable? Is my only option to provide the full set of accessors on my wrapper? Am I incorrect in my original intent to keep the wrapper clean for the vast majority of clients?
So two questions:
Why isn't there an standard C# immutable Collection interface? It seems like a fairly reasonable requirement?
Why is ReadOnlyCollection annoyingly called ReadOnlyCollection when it is really a ReadOnlyList? I was going to bite the bullet and use that until I found out it was a List (and I use a Set).

Why isn't there a standard C# immutable interface? It seems like a
fairly reasonable requirement?
A standard C# immutable¹ interface already exists: it's called IEnumerable and all containers implement it.
More powerful immutable interfaces are problematic, because there are many kinds of immutability. If the BCL team decided to pick one definition of immutability and elevate it to the immutability status it's certain that down the road people looking for a different kind of immutability would complain about the choice.
Satisfying everyone would mean not only sorting all of the immutability mess out but creating lots of interfaces (good luck picking good names for them too) and baking all these immutability concepts into the language well enough to make immutability a first-class citizen -- remember that there are no second chances here, once you ship a public class its public interface is immutable forever (pun intended). While all of this might be good to have, I 'm really skeptical about the cost/benefit ratio.
It's not difficult to define IReadOnlyList, IReadOnlySet and such if you do require them. I assume that they do not already exist because again, minus 100 points.
ReadOnlyCollection is IMHO either a concession or a class that was required internally for the BCL and exposed to the world because hey, free functionality at really low cost for the BCL team (since it would have to be implemented, documented and tested anyway). In any case I don't think that it does not live in the glamorous System.Collections.Generic neighborhood by chance.
Why is ReadOnlyCollection annoyingly called ReadOnlyCollection when it
is really a ReadOnlyList? I was going to bite the bullet and use that
until I found out it was a List (and I use a Set).
I 'm sure the BCL team would love to be able to go back in time and fix that, because it's almost certainly one of those little inconsistencies that unavoidably sneak into any library of comparable scope. Since ReadOnlyCollection implements IList it should definitely have been called ReadOnlyList.
However, given that a "list" offers more functionality than a "collection", I don't see how this would stop you. Neither is a Set, so you would have to build set-related functionality on top of them in any case (which is not a good idea; just build read-only semantics on top of Set).
¹ We 're tossing around "immutable" a lot here, but that word does not have a singular meaning. I think it would be more appropriate to use "read-only", but I 'll go with your choice of word for consistency.

This may help,
http://blogs.msdn.com/b/jaredpar/archive/2008/04/22/api-design-readonlycollection-t.aspx

I think the only way for you to provide a read-only 'copy' of the set without actually copying the data into another instance of the same or a different structure, is to go with the wrapper and implement all the item-adding-and-removing methods to throw an exception.
If your set is exposed only as an ISet anyway, consumers are only going to see the members defined on the interface, no matter what your wrapper contains - that doesn't seem like it's a bad thing.

I agree it would be nice if there were better support in .net for both immutability and read-only wrappers, though I think it's important to note that there is a huge difference between the concepts. A read-only wrapper promises its creator that consumers of it won't be able to change the underlying object, but makes no promise to consumers that the underlying object itself won't change. By contrast, an immutable object promises its creator and consumers that its values won't change.
I'm not sure why the notion that there are many different types of immunity should be a problem. If I have a generic ImmutableList<T> which takes an unqualified T my expectation would be that it will always contain the same T's as it did when it was created. The collection could in no way affect whether any of the properties of the T's could change, and thus it shouldn't be expected to.
If I had my druthers, most of the collection-related interfaces would include readable, mutable, and immutable variants (mutable and immutable would both extend from readable). I'd also add a write-only contravariant IAppendable interface, as well as an IImmutableEnumerable derived from IEnumerable (I'd add a ToImmutable method to IEnumerable (and IImmutableEnumerable); an implementation could construct an immutable collection, but in some cases that might not be the best approach. For example, a mutable object might implement IEnumerable by return a mutable number of copies of a mutable element. If the number of copies is large, converting to a simple collection could be very wasteful.

Related

Why is there no inverse to object.ToString()? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It seems like a good design decision that the System.Object class, and hence all classes, in .NET provide a ToString() method which, unsurprisingly, returns a string representation of the object. Additionally in C# this method is implemented for native types so that they integrate nicely with the type system.
This often comes in handy when user interaction is required. For example, objects can directly be held in GUI widgets like lists and are "automatically" displayed as text.
What is the rationale in the language design to not provide a similarly general object.FromString(string) method?
Other questions and their answers discuss possible objections, but I find them not convincing.
The parse could fail, while a conversion to string is always possible.
Well, that does not keep Parse() methods from existing, does it? If exception handling is considered an undesirable design, one could still define a TryParse() method whose standard implementation for System.Object simply returns false, but which is overridden for concrete types where it makes sense (e.g. the types where this method exists today anyway).
Alternatively, at a minimum it would be nice to have an IParseable interface which declares a ParseMe() or TryParse() method, along the lines of ICloneable.
Comment by Tim Schmelter's "Roll your own": That works of course. But I cannot write general code for native types or, say, IPAddress if I must parse the values; instead I have to resort to type introspection or write wrappers which implement a self-defined interface, which is either maintenance-unfriendly or tedious and error-prone.
Comment by Damien: An interface can only declare non-static functions for reasons discussed here by Eric Lippert. This is a very valid objection. A static TryParse() method cannot be specified in an interface. A virtual ParseMe(string) method though needs a dummy object, which is a kludge at best and impossible at worst (with RAII). I almost suspect that this is the main reason such an interface doesn't exist. Instead there is the elaborate type conversion framework, one of the alternatives mentioned as solutions to the "static interface" oxymoron.
But even given the objections listed, the absence of a general parsing facility in the type system or language appears to me as an awkward asymmetry, given that a general ToString() method exists and is extremely useful.
Was that ever discussed during language/CLR design?
It seems like a good design decision that the System.object class, and hence all classes, in .NET provide a ToString() method
Maybe to you. It's always seemed like a really bad idea to me.
which, unsurprisingly, returns a string representation of the object.
Does it though? For the vast majority of types, ToString returns the name of the type. How is that a string representation of the object?
No, ToString was a bad design in the first place. It has no clear contract. There's no clear guidance on what its semantics should be, aside from having no side effects and producing a string.
Since ToString has no clear contract, there is practically nothing you can safely use it for except for debugger output. I mean really, think about it: when was the last time you called ToString on object in production code? I never have.
The better design therefore would have been methods static string ToString<T>(T) and static string ToString(object) on the Debug class. Those could have then produced "null" if the object is null, or done some reflection on T to determine if there is a debugger visualizer for that object, and so on.
So now let's consider the merits of your actual proposal, which is a general requirement that all objects be deserializable from string. Note that first, obviously this is not the inverse operation of ToString. The vast majority of implementations of ToString do not produce anything that you could use even in theory to reconstitute the object.
So is your proposal that ToString and FromString be inverses? That then requires that every object not just be "represented" as a string, but that it actually be round trip serializable to string.
Let's think of an example. I have an object representing a database table. Does ToString on that table now serialize the entire contents of the table? Does FromString deserialize it? Suppose the object is actually a wrapper around a connection that fetches the table on demand; what do we serialize and deserialize then? If the connection needs my password, does it put my password into the string?
Suppose I have an object that refers to another object, such that I cannot deserialize the first object without also having the second in hand. Is serialization recursive across objects? What about objects where the graph of references contains loops; how do we deal with those?
Serialization is difficult, and that's why there are entire libraries devoted to it. Making it a requirement that all types be serializable and deserializable is onerous.
Even supposing that we wanted to do so, why string of all things? Strings are a terrible serialization data type. They can't easily hold binary data, they have to be entirely present in memory at once, they can't be more than a billion characters tops, they have no structure to them, and so on. What you really want for serialization is a structured binary storage system.
But even given the objections listed, the absence of a general parsing facility in the type system or language appears to me as an awkward asymmetry, given that a general ToString() method exists and is extremely useful.
Those are two completely different things that have nothing to do with each other. One is a super hard problem best solved by libraries devoted to it, and the other is a trivial little debugging aid with no specification constraining its output.
Was that ever discussed during language/CLR design?
Was ToString ever discussed? Obviously it was; it got implemented. Was a generalized serialization library ever discussed? Obviously it was; it got implemented. I'm not sure what you're getting at here.
Why is there no inverse to object.ToString()?
Because object should hold the bare minimum functionality required by every object. Comparing equality and converting to string (for a lot of reasons) are two of them. Converting isn't. The problem is: how should it convert? Using JSON? Binary? XML? Something else? There isn't one uniform way to convert from a string. Hence, this would unnecessarily bloat the object class.
Alternatively, at a minimum it would be nice to have an IParseable interface
There is: IXmlSerializable for example, or one of the many alternatives.

Why should we avoid public methods? Benefits of encapsulation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Before down-voting let me explain my question. I have a little experience in designing architectures and try to progress. Ones, when I was fixing a bug, I came up with a conclusion that we need to make our private method to be public and than use it. That was the fastest way to make my job done, and have a bug fixed. I went to my team-leader and said it. After I've got a grimace from him, I was explained that every public method is a very expensive pleasure. I was told that every public method should be supported throughout the lifetime of a project. And much more..
I was wondering. Indeed! Why it wasn't so clearly when I was looking in the code. It wasn't also so evidently when I designed my own architectures. I remember my thoughts about it:
Ahh, I will leave this method public, who knows, maybe it will come usefull when the system grows.
I was confused, and thought that I made scaleable systems, but in fact got tons of garbage in my interfaces.
My question:
How can you explain to yourself if a method is really important and worthy to be public? Are any counterexamples for checking it? How you get trained to make private/public choise without spending hours in astral?
I suggest you read up on YAGNI http://c2.com/cgi/wiki?YouArentGonnaNeedIt
You should write code to suit actual requirements because writing code to suit imagined requirements leads to bloated code which is harder to maintain.
My favourite quote
Perfection is achieved, not when there is nothing more to add, but
when there is nothing left to take away.
-- Antoine de Saint-Exupery French writer (1900 - 1944)
This question need a deep and thorough discussion on OOP design, but my simple answer is anything with public visibility can be used by other classes. Hence if you're not building method for others to use, do not make it public.
One pitfall of unecessarily making private method public is when other classes did use it, it makes it harder for you to refactor / change the method, you have to maintain the downstream (think if this happen to hundreds of classes)
But nevertheless maybe this discussion will never end. You should spend more time reading OOP design pattern books, it will give you heaps more idea
There are a few questions you can ask yourself about the domain in which the object exists:
Does this member (method, property, etc.) need to be accessed by other objects?
Do other objects have any business accessing this member?
Encapsulation is often referred to as "data hiding" or "hiding members" which I believe leads to a lot of confusion. Inexperienced developers would rightfully ask, "Why would I want to hide anything from the rest of my code? If it's there, I should be able to use it. It's my code after all."
And while I'm not really convinced with the way in which your team leader worded his response, he has a very good point. When you have too many connection points between your objects, you end up with too many connections. Objects become more and more tightly coupled and fuse into one big unsupportable mess.
Clearly and strictly maintaining a separation of concerns throughout the architecture can significantly help prevent this. When you design your objects, think in terms of what their public interfaces would look like. What kind of outwardly-visible attributes and functionality would they have? Anything which wouldn't reasonably be expected as part of that functionality shouldn't be public.
For example, consider an object called a Customer. You would reasonably expect some attributes which describe a Customer, such as:
Name
Address
Phone Number
List of orders processed
etc.
You might also expect some functionality available:
Process Payment
Hold all Orders
etc.
Suppose you also have some technical considerations within that Customer. For example, maybe the methods on the Customer object directly access the database via a class-level connection object. Should that connection object be public? Well, in the real world, a customer doesn't have a database connection associated with it. So, clearly, no it should not be public. It's an internal implementation concern which isn't part of the outwardly-visible interface for a Customer.
This is a pretty obvious example, of course, but illustrates the point. Whenever you expose a public member, you add to the outwardly-visible "contract" of functionality for that object. What if you need to replace that object with another one which satisfies the same contract? In the above example, suppose you wanted to create a version of the system which stores data in XML files instead of a database. If other objects outside of the Customer are using its public database connection, that's a problem. You'd have to change a lot more about the overall design than just the internal implementation of the Customer.
As a general rule it's usually best to prefer the strictest member visibilities first and open them up as needed. Combine that guideline with an approach of thinking of your objects in terms of what real-world entities they represent and what functionality would be visible on those entities and you should be able to determine the correct course of action for any given situation.

Is there a practical reason for why List<T>.IndexOf(T) does not return a nullable int? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The method List<T>.IndexOf() returns the zero-based index of the first occurrence of item within the entire List, if found; otherwise, –1.
I'm seeing a parallel between that, and something I just read in Code Complete, which is telling me to "avoid variables with hidden meanings".
For example:
The value in the
variable pageCount might represent the number of pages printed, unless
it equals -1, in which case it indicates that an error has occurred.
Well, I don't know if the meaning is "hidden", because it's documented clearly enough, but null seems to convey better meaning to me than -1, and .HasValue reads like a much better check than > -1. As far as I can tell, List and nullable types were both introduced in C# 2.0, so I don't think the reason for retuning an int has to do with backwards compatibility. So, do you know if there was a reason, or if this was just something that someone forgot to implement, and we now have to live with that mistake forever?
List was released with the 2.0 version of the run time as was nullable T. But List implements IList which existed with the 1.0 version of the runtime which not only didn't have nullable it didn't support generics. To meet the contract of the IList interface an implementer must return -1 on indexof failure. Code complete also maintains that you must meet the contracts you agree to and therefore List.Indexof must return -1.
Answer to Comment:
By the time the 2.0 runtime with generics there were already thousands of applications written against the non generic version. That code would be allowed to function most effectively be migrated by supporting the generic interface as an extension of the non-generic interfaces. Also, image having to classes with the same name and 90% same usage but in the case of a couple of methods totally different semantics.
System.Collections.Generic.List may have been introduced in .NET 2.0, along with nullable, but the usage of -1 predates that considerably. ArrayList.IndexOf(), Array.IndexOf(), String.IndexOf(), SelectedIndex of a listbox, etc. So List<T> most likely did the same for consistency with the existing library. (And in fact even classic VB has this meaning, so it even predates .NET 1.0)
As #rerun points out, it isn't just stylistic consistency, it's actually part of the interface contract. So +1 to him.
While List<T> didn't have to follow in the footsteps of ArrayList and didn't have to implement IList, doing so made it much more useful to programmers already familiar with those.
Just a guess, but I would think it has to do with historical reasons and readability. Even though -1 is a "special value", it will never be returned in any other case. The rule you quoted in my opinion is primarily to keep you from returning values that could have more than one meaning. -1 is standard in c++ and similar languages, so it has become somewhat of an idiom across languages. Also, while hasvalue might be more readable, I think it would be best to have the same style consistently across the framework, which would require starting from the ground up. Finally, IMHO, in my limited experience, because nullable types have to be unboxed in order to use them elsewhere, they can be more trouble than they're worth, but that's just me.
I think it's for backwards compatibility, as in framework 1.1 there wasn't nullables and the API was created back then.
The reason may be the simple fact that Nullable<T> (the actual type used when you write T?) involves unwanted complexity for something as simple as an index search. For T? x doing x == null is really only syntactic sugar for x.HasValue and when it turns out it's not null you'd still have to access the actual value through x.Value.
You don't really gain anything in comparison to just returning a negative value, which is not a valid index either. In fact, you make things just more complex.
This follows the same pattern as String.IndexOf, and many other implementations and variations of IndexOf methods, not only in the .NET libraries.
Using -1 as return value is a magic number, and those should normally be avoided. However, this pattern is well known, which is clearly an advantage when you implement a library method.

Why do some languages prefer static method binding rather than dynamic? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Why is the default decision in C++, C#, and Ada 95 to use static method binding, rather than dynamic method binding.?
Is the gain in implementation speed worth the loss in abstraction and re-usability?
In general, you can consider that you have todesign the base class for extensibility. If a member function (to use the C++ vocabulary) isn't designed to be overridden, there is a good chance than overriding it will in practice not be possible and for sure it won't it be possible without knowledge of what the class designer think is implementation details and will change without giving you prior notice.
Some additional considerations for two languages (I don't know C# enough to write about it):
Ada 95 would have had compatibility issues with Ada 83 if the choice was different. And considering the whole object model of Ada 95, doing it differently would have make no sense (but you can consider that compatibility was a factor in the choice of the object model).
For C++, performance was certainly a factor. The you don't pay for what you don't use principle and the possibility to use C++ just as a better C was quite instrumental in its success.
The obvious answer is because most functions shouldn't be virtual. As AProgrammer points out, unless a function has been designed explicitly to be overridden, you probably can't override it (virtual or not) without breaking class invariants. (When I work in Java, for example, I end up declaring most functions final, as a matter of good engineering. C++ and Ada make the right decision: the author must explicitly state that the function is designed to be overridden.
Also, C++ and (I think) Ada support value semantics. And value semantics doesn't work well with polymorphism; in Java, classes like java.lang.String are final, in order to simulate value semantics for them. Far to many applications programmers, however, don't bother, since it's not the default. (In a similar manner, far too many C++ programmers omit to inhibit copy and assignment when the class is polymorphic.)
Finally, even when a class is polymorphic, and designed for inheritance, the contract is still specified, and in so far as is reasonable, enforced, in the base class. In C++, typically, this means that public functions are not virtual, since it is the public functions which define and enforce the contract.
I can't speak about Ada, but for C++ two important goals for the design of C++ were:
backwards compatibility with C
you should pay nothing (to the extent possible) for features that you don't use
While neither of these would necessarily dictate that dynamic binding couldn't have been chosen to be the default, having static method binding (I assume you mean non-virtual member functions) does seem to 'fit' better with these design goals.
I'll give one of the other two thirds of Michael Burr's answer.
For Ada it was an important design goal that the language be suitable for system's programming and use on small realtime embedded devices (eg: missile and bomb CPUs). Perhaps there are now techniques that would allow dynamic languages to do such things well, but there certianly weren't back in the late 70's and early 80's when the language was first being designed. Ada95 of course could not radically deviate from the orginal language's basic underlying design, any more than C++ could from C.
That being said, both Ada and C++ (and certianly C# as well?) provide a way to do dynamic method binding ("dynamic dispatch") if you really want it. In both it is accesed via pointers, which IMHO are kind of error-prone. It can also make things a bit of a pain to debug, as it is tough to tell from sources alone exactly what is getting called. So I avoid it unless I really need it.

Are immutable objects good practice? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Should I make my classes immutable where possible?
I once read the book "Effective Java" by Joshua Bloch and he recommended to make all business objects immutable for various reasons. (for example thread safety)
Does this apply for C# too?
Do you try to make your objects immutable, so you have less problems when working with them?
Or is it not worth the inconvenience you have to create them?
The immutable Eric Lippert has written a whole series of blog posts on the topic. Part one is here.
Quoting from the earlier post that he links to:
ASIDE: Immutable data structures are the way of the future in C#. It is much easier to reason about a data structure if you know that it will never change. Since they cannot be modified, they are automatically threadsafe. Since they cannot be modified, you can maintain a stack of past “snapshots” of the structure, and suddenly undo-redo implementations become trivial. On the down side, they do tend to chew up memory, but hey, that’s what garbage collection was invented for, so don’t sweat it.
This is going to be more of an opinion type answer but...
I find that the ease of understanding a program, i.e. maintaining and debugging said application, is inversly proportional to the amount of stateful transitions that occur during the processing of each component. The less state I need to cart around in my head, the more focus I can pay attention to the logic within the algorithms as it is written.
Immutable objects are the central feature of functional programming; it has its own advantages and disadvantages. (E.g. linked lists are practically impossible to be immutable, but immutable objects make parallelism a piece of cake.) So as a comment on your post noted, the answer is "it depends".
Off the top of my head, I can't think of a reason for immutable objects making thread safe code somehow "better".
If I want an object to be thread safe, I will either put a lock around it or I will make a copy of it and update the reference once I'm done working on it. I typically wouldn't want a new object for every little change.
For me, immutable strings create more headaches for threading than it helps.
I actually went out of my way to make an "in-place" "ToUpper" using unsafe code isntead of the built in String.ToUpper(). It runs about 4 times faster and consumes 1/2 the peak memory.
Another nice benefit of immutable structures is that you can locally cache instances of them and reuse them across multiple threads without fear of unexpected behaviors as would be the case if they were mutable.
For instance, suppose you are using an external caching service such as memcached or Velocity or some other equally simplistic distributed hashtable service. You could just use the C# client library and call it good enough. However, that is being wasteful with resources given a short-lived context like a web request scenario. What you really want is to pull each object from the cache once and only once in your context.
The safest way to get this job done is to place a local hashtable in your process in front of the cache provider. On the first request for the cache key you'd pull down the serialized byte stream that represents the object you wish to use and store that byte stream in your local hashtable. On subsequent requests for the same cache key, just look up the byte stream in the local hashtable and deserialize the object to a new instance for each request. This is to prevent multiple redundant trips to the cache server node for the same information that assumedly has not changed over the lifetime of your context.
With immutable structures, you could deserialize the byte stream only once on the first request and get away with storing the deserialized instance in the hashtable instead of the byte stream and just share that one single immutable instance of your object. This obviously cuts down on deserialization penalties which can add up rather quickly if your consuming code is written in such a fashion that it does not care how many calls it makes to the caching provider, assuming the cache is faster than querying your underlying data store.
Perhaps this is more of a subjective answer, but it's a specific problem that can be solved uniquely by using immutable structures so I thought it was relevant to share.

Categories