Why was constness removed from Java and C#? - c#

I know this has been discussed many times, but I am not sure I really understand why Java and C# designers chose to omit this feature from these languages. I am not interested in how I can make workarounds (using interfaces, cloning, or any other alternative), but rather in the rationale behind the decision.
From a language design perspective, why has this feature been declined?
P.S: I'm using words such as "omitted", which some people may find inadequate, as C# was designed in an additive (rather than subtractive) approach. However, I am using such words because the feature existed in C++ before these languages were designed, so it is omitted in the sense of being removed from a programmer's toolbox.

In this interview, Anders said:
Anders Hejlsberg: Yes. With respect to
const, it's interesting, because we
hear that complaint all the time too:
"Why don't you have const?" Implicit
in the question is, "Why don't you
have const that is enforced by the
runtime?" That's really what people
are asking, although they don't come
out and say it that way.
The reason that const works in C++ is
because you can cast it away. If you
couldn't cast it away, then your world
would suck. If you declare a method
that takes a const Bla, you could pass
it a non-const Bla. But if it's the
other way around you can't. If you
declare a method that takes a
non-const Bla, you can't pass it a
const Bla. So now you're stuck. So you
gradually need a const version of
everything that isn't const, and you
end up with a shadow world. In C++ you
get away with it, because as with
anything in C++ it is purely optional
whether you want this check or not.
You can just whack the constness away
if you don't like it.

I guess primarily because:
it can't properly be enforced, even in C++ (you can cast it)
a single const at the bottom can force a whole chain of const in the call tree
Both can be problematic. But especially the first: if it can't be guaranteed, what use is it? Better options might be:
immutable types (either full immutability, or popsicle immutability)

As to why they did it those involved have said so:
http://blogs.msdn.com/ericgu/archive/2004/04/22/118238.aspx
http://blogs.msdn.com/slippman/archive/2004/01/22/61712.aspx
also mentioned by Raymond Chen
http://blogs.msdn.com/oldnewthing/archive/2004/04/27/121049.aspx
In a multi language system this would have been very complex.

As for Java, how would you have such a property behave? There are already techniques for making objects immutable, which is arguably a better way to achieve this with additional benefits. In fact you can emulate const behaviour by declaring a superclass/superinterface that implements only the methods that don't change state, and then having a subclass/subinterface that implements the mutating methods. By upcasting your mutable class to an instance of class with no write methods, other bits of code cannot modify the object without specifically casting it back to the mutable version (which is equivalent to casting away const).
Even if you don't want the object to be strictly immutable, if you really wanted (which I wouldn't recommend) you could put some kind of 'lock' mode on it so that it could only be mutated when unlocked. Have the lock/unlock methods be private, or protected as appropriate, and you get some level of access control there. Alternatively, if you don't intend for the method taking it as a parameter to modify it at all, pass in a copy of that object, or if copying the entire object is too heavyweight then some other lightweight data object that contains just the necessary information. You could even use dynamic proxies to create a proxy to your object that turn any calls to mutation methods into no-ops.
Basically there are already a whole bunch of ways to prevent a class being mutated, which let you choose one that fits most appropriately into your situation (hint: choose pure immutability wherever possible as it makes the object trivially threadsafe and easier to reason with in general). There are no obvious semantics for how const could be implemented that would be an improvement on these techniques, it would be another thing to learn that would either lack flexibility, or be so flexible as to be useless.
That is, unless I've missed something, which is entirely possible. :-)

Java have its own version of const; final. Joshua Bloch describes in his Effective Java
how you effectively use the final keyword. (btw, const is a reserved keyword in Java, for future discrepancies)

Related

Immutable Data Structures in C#

I was reading some entries in Eric Lippert's blog about immutable data structures and I got to thinking, why doesn't C# have this built into the standard libraries? It seems strange for something with obvious reuse to not be already implemented out of the box.
EDIT: I feel I might be misunderstood on my question. I'm not asking how to implement one in C#, I'm asking why some of the basic data structures (Stack, Queue, etc.) aren't already available as immutable variants.
It does now.
.NET just shipped their first immutable collections, which I suggest you try out.
Any framework, language, or combination thereof that is not a purely experimental exercise has a market. Some purely experimental ones go on to develop a market.
In this sense, "market" does not necessarily refer to market economics, it's as true whether the producers of the framework/language/both are commercially or non-commercially oriented and distributing the framework/language/both (I'm just going to say "framework" for now on) at a cost or for free. Indeed, free-as-in-both-beer-and-speech projects can be even more heavily dependent on their markets than commercial projects in this way, because their producers are a subset of their market. The market is anyone and everyone who uses it.
The nature of this market will affect the framework in several ways both by organic processes (some parts prove more popular than others and get a bigger share of the mindspace within the community that educates its own members about them) and by fiat (someone decides a feature will serve the market and therefore adds it).
When .NET was developed, it was developed to serve its future market. Ideas about what would serve them therefore influenced decisions as to what should and should not be included in the FCL, the runtime, and the languages that work with it.
Someone decided that we'd quite likely need System.Collections.ArrayList. Someone decided we'd quite likely need System.IO.DirectoryInfo to have a Delete() method. Nobody decided we'd be likely to need a System.Collections.ImmutableStack.
Maybe nobody thought of it at all. Maybe someone did and even implemented it and then it was decided not to be of enough general use. Maybe it was debated at length within Microsoft. I've never worked for MS, so I don't have a clue.
What I can consider though, is the question as to what the people who were using the .NET framework in 2002 using in 2001.
Well, COM, ActiveX, ("Classic") ASP, and VB6 & VBScript is now much less used than it was, so it can be said to have been replaced by .NET. Indeed, that could be said to have been an intention.
As well as VB6 & VBScript, a considerable number who were writing in C++ and Java with Windows as a sole or primary target platform are now at least partly using .NET instead. Again, I think that could be said to be an intention, or at the very least I don't think MS were surprised that it went that way.
In COM we had an enumerator-object based foreach approach to iteration that had direct support in some languages (the VB family*), and .NET we have an enumerator-object based foreach approach to iteration that has direct support in some languages (C#, VB.NET, and others)†.
In C++ we had a rich set of collection types from the STL, and in .NET we have a rich set of collection types from the FCL (and typesafe generic types from .NET2.0 on).
In Java we had a strong everything-is-an-object style of OOP with a small set of methods provided by a common base-type and a boxing mechanism to allow for simple efficient primitives while keeping to this style. In .NET we have a strong everything-is-an-object style of OOP with a small set of methods provided by a common base-type and a (different) boxing mechanism to allow for simple efficient primitives while keeping to this style.
These cases show choices that are unsurprising considering who was likely to end up being the market for .NET (though such broad statements above shouldn't be read in a way that underestimates the amount of work and subtlety of issues within each of them). Another thing that relates to this is when .NET differs from COM or classic VB or C++ or Java, there may well be a bit of an explanation given in the documentation. When .NET differs from Haskell or Lisp, nobody feels the need to point it out!
Now, of course there are things done differently in .NET than to any of the above (or there'd have been no point and we could have stayed with COM etc.)
However, my point is that out of the near-infinite range of possible things that could end up in a framework like .NET, there are some complete no-brainers ("they might need some sort of string type..."), some close-to-obvious ("this is really easy to do in COM, so it must be easy to do in .NET"), some harder calls ("this will be more complicated than in VB6, but the benefits are worth it"), some improvements ("locale support could really be made a lot easier for developers, so lets build a new approach to the old issue") and some that were less related to the above.
At the other extreme, we can probably all imagine something that would be so out there as to be bizarre ("hey, all coders like Conway's Life - let's put a Conway's Life right into the framework") and hence there's no surprise at not finding it supported.
So far I've quickly brushed over a lot of hard work and difficult design balances in a way that makes the design they came up with seem simpler than it no doubt was. Most likely, the more "obvious" it seems to an outsider after the fact, the more difficult it was for the designers.
Immutable collection types falls into the large range of possible components to the FCL that while not as bizarre as a built-in-conway-support idea, was not as strongly called for by examining the market as a mutable list or a way to encapsulate locale information nicely. It would have been novel to much of the initial market, and therefore at risk of ending up not being used. In an alternate universe there's a .NET1.0 with immutable collections, but it's not very surprising that there wasn't one here.
*At least for consuming. Producing IEnumVARIANT implementations in VB6 wasn't simple, and could involve writing pointer values straight into v-tables in a rather nasty way that it suddenly occurs to me, is possibly not even allowed by today's DEP.
†With a sometimes impossible to implement .Reset() method. Is there any reason for this other than it was in IEnumVARIANT? Was it even ever much used in IEnumVARIANT?
I'll quote from that Eric Lippert blog that you've been reading:
because no one ever designed, specified, implemented, tested, documented and shipped that feature.
In other words, there is no reason other than it hasn't been high enough value or priority to get done ahead of all the other things they're working on.
Why can't you make an immutable struct? Consider the following:
public struct Coordinate
{
public int X
{
get { return _x; }
}
private int _x;
public int Y
{
get { return _y; }
}
private int _y;
public Coordinate(int x, int y)
{
_x = x;
_y = y;
}
}
It's an immutable value type.
It's hard to work with immutable data structures unless you have some functional programming constructs. Suppose you wanted to create an immutable vector containing every other capital letter. How would you do it unless you
A) had functions that did things like range(65, 91), filter(only even) and map(int -> char) to create the sequence in one shot and then turn it into an array
B) created the vector as mutable, added the characters in a loop, then then "froze" it, making it immutable?
By the way, C# does have the B option to some extent -- ReadOnlyCollection can wrap a mutable collection and prevent people from mutating it. However, it's a pain in the ass to do that all the time (and obviously it's hard to support sharing structure between copies when you don't know if something is going to become immutable or not.) A is a better option.
Remember, when C# 1.0 existed, it didn't have anonymous functions, it didn't have language support for generators or other laziness, it didn't have any functional APIs like LINQ -- not even map or filter -- it didn't have concise array initialization syntax (you couldn't write new int[] { 1, 2, 5 }) and it didn't have generic types; just putting stuff into and getting stuff out of collections normally was a pain. So I don't think it would have been a great choice to spend time on making robust immutable collections with such poor language support for using them.
It would be nice if .net had some really solid support for immutable data holders (classes and structures). One difficulty with adding really good support for such things, though, is that taking maximum advantage of mutable and immutable data structures would require some fundamental changes to the way inheritance works. While I would like to see such support in the next major object-oriented framework, I don't know that it can be efficiently worked into existing frameworks like .net or Java.
To see the problem, imagine that there are two basic data types: basicItem and deluxeItem (which is a basicItem with a few extra fields added). Each can exist in two concrete forms: mutable and immutable. Each can also be described in an abstract form: readable. Thus, there should be six data types; all but ReadableBasicItem should be substitutable for at least one other:
ReadableBasicItem: Not substitutable for anything
MutableBasicItem: ReadableBasicItem
ImmutableBasicItem: ReadableBasicItem
ReadableDeluxeItem: ReadableBasicItem
MutableDeluxeItem: ReadableDeluxeItem, MutableBasicItem (also ReadableBasicItem)
ImmutableDeluxeItem: ReadableDeluxeItem, ImmutableBasicItem (also ReadableBasicItem)
Even thought the underlying data type has just one base and one derived type, there inheritance graph has two "diamonds" since both "MutableDeluxeItem" and "ImmutableDeluxeItem" have two parents (MutableBasicItem and ReadableDeluxeItem), both of which inherit from ReadableBasicItem. Existing class architectures cannot effectively deal with that. Note that it wouldn't be necessary to support generalized multiple inheritance; merely to allow some specific variants such as those above (which, despite having "diamonds" in the inheritance graph, have an internal structure such that both ReadableDeluxeItem and MutableBasicItem would inherit from "the same" ReadableBasicItem).
Also, while support for that style of inheritance of mutable and immutable types might be nice, the biggest payoff wouldn't happen unless the system had a means of distinguishing heap-stored objects that should expose value semantics from those which should expose reference semantics, could distinguish mutable objects from immutable ones, and could allow objects to start out in an "uncommitted" state (neither mutable nor guaranteed immutable). Copying a reference to a heap object with mutable value semantics should perform a memberwise clone on that object and any nested objects with mutable value semantics, except in cases where the original reference would be guaranteed destroyed; the clones should start life as uncommitted, but be CompareExchange'd to mutable or immutable as needed.
Adding framework support for such features would allow copy-on-write value semantics to be implemented much more efficiently than would be possible without framework support, but such support would really have to be built into the framework from the ground up. I don't think it could very well be overlaid onto an existing framework.

C# classes - Why so many static methods?

I'm pretty new to C# so bear with me.
One of the first things I noticed about C# is that many of the classes are static method heavy. For example...
Why is it:
Array.ForEach(arr, proc)
instead of:
arr.ForEach(proc)
And why is it:
Array.Sort(arr)
instead of:
arr.Sort()
Feel free to point me to some FAQ on the net. If a detailed answer is in some book somewhere, I'd welcome a pointer to that as well. I'm looking for the definitive answer on this, but your speculation is welcome.
Because those are utility classes. The class construction is just a way to group them together, considering there are no free functions in C#.
Assuming this answer is correct, instance methods require additional space in a "method table." Making array methods static may have been an early space-saving decision.
This, along with avoiding the this pointer check that Amitd references, could provide significant performance gains for something as ubiquitous as arrays.
Also see this rule from FXCOP
CA1822: Mark members as static
Rule Description
Members that do not access instance data or call instance methods can
be marked as static (Shared in Visual Basic). After you mark the
methods as static, the compiler will emit nonvirtual call sites to
these members. Emitting nonvirtual call sites will prevent a check at
runtime for each call that makes sure that the current object pointer
is non-null. This can achieve a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue.
Perceived functionality.
"Utility" functions are unlike much of the functionality OO is meant to target.
Think about the case with collections, I/O, math and just about all utility.
With OO you generally model your domain. None of those things really fit in your domain--it's not like you are coding and go "Oh, we need to order a new hashtable, ours is getting full". Utility stuff often just doesn't fit.
We get pretty close, but it's still not very OO to pass around collections (where is your business logic? where do you put the methods that manipulate your collection and that other little piece or two of data you are always passing around with it?)
Same with numbers and math. It's kind of tough to have Integer.sqrt() and Long.sqrt() and Float.sqrt()--it just doesn't make sense, nor does "new Math().sqrt()". There are a lot of areas it just doesn't model well. If you are looking for mathematical modeling then OO may not be your best bet. (I made a pretty complete "Complex" and "Matrix" class in Java and made them fairly OO, but making them really taught me some of the limits of OO and Java--I ended up "Using" the classes from Groovy mostly)
I've never seen anything anywhere NEAR as good as OO for modeling your business logic, being able to demonstrate the connections between code and managing your relationship between data and code though.
So we fall back on a different model when it makes more sense to do so.
The classic motivations against static:
Hard to test
Not thread-safe
Increases code size in memory
1) C# has several tools available that make testing static methods relatively easy. A comparison of C# mocking tools, some of which support static mocking: https://stackoverflow.com/questions/64242/rhino-mocks-typemock-moq-or-nmock-which-one-do-you-use-and-why
2) There are well-known, performant ways to do static object creation/logic without losing thread safety in C#. For example implementing the Singleton pattern with a static class in C# (you can jump to the fifth method if the inadequate options bore you): http://www.yoda.arachsys.com/csharp/singleton.html
3) As #K-ballo mentions, every method contributes to code size in memory in C#, rather than instance methods getting special treatment.
That said, the 2 specific examples you pointed out are just a problem of legacy code support for the static Array class before generics and some other code sugar was introduced back in C# 1.0 days, as #Inerdia said. I tried to answer assuming you had more code you were referring to, possibly including outside libraries.
The Array class isn't generic and can't be made fully generic because this would break backwards compatibility. There's some sort of magic going on where arrays implement IList<T>, but that's only for single-dimension arrays with a lower bound of 0 – "list-ish" arrays.
I'm guessing the static methods are the only way to add generic methods that work over any shape of array regardless of whether it qualifies for the above-mentioned compiler magic.

What are the benefits of proper scoping?

So yeah, the question basically says it all. What do you gain when you ensure that private members / methods / whatever are marked private (or protected, or public, or internal, etc) appropriately?
I mean, of course I could just go and mark all my methods as public and everything should still work fine. Of course, if we'd talk about good programming practice (which I am a solid advocate of, by the way ), I'd mark a method as private if it should be marked as such, no questions asked.
But let's set aside good programming practice, and just look at this in terms of actual quantitative gain. What do I get for proper scoping of my methods, members, classes, etc.?
I'm thinking that this would most generally translate to performance gains, but I'd appreciate it if someone could provide more detail about it.
(For purposes of this question, I'm thinking more along C#.NET, but hey, feel free to provide answers on whatever language / framework you deem fit.)
EDIT: Most pointed out that this doesn't lead to performance gain, and yeah, thinking back, I don't even know why I thought that. Lack of coffee probably.
In any case, any good programmer should know about how proper scopes (1) help your code maintenance / (2) control the proper use of your library / app / package; I was kinda curious as to whether or not there was any other benefit you get from it that's not apparently obvious outright. Based on the answers below, it looks like it basically sums up to just those two things most importantly.
Performance has absolutely nothing to do with the visibility of methods. Virtual methods have some overhead, but that's not why we scope. It has to do with maintenance of code. Your public methods are the API to your class or library. You as a class designer want to provide some guarantee to the outside world that future changes aren't going to break other peoples code. By marking some methods private, you take away the ability for users to depend on certain implementations which allows you freedom to change that implementation at will.
Even languages that don't have visibility modifiers, like python, have conventions for marking methods as internal and subject to change. By prefixing the method with an _underscore(), you're signalling to the outside world that if you use that method, you do so at your own risk, as it can change at any time.
On the other hand, public methods are an explicit entry way into your code. Every effort should go towards making public methods backward compatible to avoid the pitfalls I described above.
By better encapsulation, you provide a better API. Only methods / properties that are of interest of the user of your class are available : visible.
Next to that, you ensure that certain variables that should not be called / modified, cannot be called/modified.
That's the most important thing. Why do you think this would lead to performance gains ?
As I see you gain two important features from proper scoping. You API is reduced in size and clearly focused on the task at hand.
Second, you get a less brittle implementation as you are free to change implementation details without altering the exposed API.
I cannot see how accessibility modifiers would affect performance in any way.
There are mainly two types of methods/properties.
That are helpful to perform a task to whoever consumes it. (Recommended Scope: Public)
That are helpful to the above methods to get their task done. (Recommended Scope: Private or Protected)
Type 1 methods are the only methods that any client code requires and does not need any other method. This avoids confusion, keeps things simple and prevents client code to do something wrong.
Type 2 methods are methods into which Type 1 methods are divided. They help Type 1 methods to complete their task and still allow them to be simple, concise, less complex and more readable. They are not really needed for client code but just the class/module itself.
A fair example would be of a car. What you have is a gas pedal, brakes, gearbox, etc. You don't have an interface to minor details for what is under the hood. That is for the mechanic.
In C# programming, it helps to make sure that your API/classes/methods/members are "easy to use correctly and difficult to use incorrectly".

How does const correctness help write better programs?

This question is from a C# guy asking the C++ people. (I know a fair bit of C but only have some general knowledge of C++).
Allot of C++ developers that come to C# say they miss const correctness, which seems rather strange to me. In .Net in order to disallow changing of things you need to create immutable objects or objects with read only properties/fields, or to pass copies of objects (by default structs are copied).
Is C++ const correctness, a mechanism to create immutable/readonly objects, or is there more to it? If the purpose is to create immutable/readonly objects, the same thing can be done in an environment like .Net.
A whole section is devoted to Const Correctness in the FAQ. Enjoy!
const correctness in C++ is at heart simply a way of saying exactly what you mean:
void foo( const sometype & t ) {
...
}
says "I will not change the object referred to by t, and if I try to, please Mr. Compiler warn me about it".
It is not a security feature or a way of (reliably) creating read-only objects, because it is trivial to subvert using C-style casts.
I bet this is a matter of mindsets. I'm working myself with long-time C++ people and have noticed they seem to think in C++ (of course they do!). It will take time for them to start seeing multiple answers to the ones they've grown familiar to have the 'stock' answers like const correctness. Give them time, and try to educate them gently.
Having said that, I do like the C++ 'const' way of guarantees. It's mainly promising the user of an interface that the object, data behind the pointer or whatever won't be messed with. I would see this as the #1 value of const correctness. #2 value is what it can allow the compiler in terms of optimization.
The problem is, of course, if the whole sw stack hasn't been built with const in mind from the ground up.
I'm sure C# has its own set of paradigms to do the same. This shows why it's so important to "learn one new (computing or natural) language a year". Simply to exercise one's mind of the other ways to see the world, and solve its problems.
With const-correctness, you can expose a mutable object as read-only without making a copy of it and wasting precious memory (especially relevant for collections). I know, there is this CPU-and-memory-are-cheap philosophy among desktop developers, but it still matters in embedded world.
On top of that, if you add complexities of memory ownership in C++, it is almost always better not to copy non-trivial objects that contain or reference other objects, hence there is a method of exposing existing objects as read-only.
Excellent Scott Meyer's book Effective C++ has dedicated an item to this issue:
ITEM 3: Use const whenever possible.
It really enlightens you :-)
The real answer, even if it is really hard to grasp if you have not used it, is that you loose expressiveness. The language has concepts that you cannot possibly express in C#, and they have meaning, they are part of the design, but lost in the translation to code.
This is not an answer, but rather an example:
Consider that you have a container that is sorted on a field of the elements that are stored. Those objects are really big. You need to offer access to the data for readers (consider showing the information in the UI).
Now, in C#/Java, you can go in one of two ways: either you make a copy for the caller to use (guarantees that the data will not change, but inefficient) or you return a reference to your internally held object (just hoping the caller will not change your data through setters).
If the user gets the reference and changes through it the field that serves as index, then your container invariants are broken.
In C++ you can return a constant reference/pointer, and the compiler will disable calling setter methods (mutating methods) on the instance you return. You get both the security that the user will not change (*) and efficiency in the call.
The third not mentioned before option is making the object inmutable, but that disables changes everywhere. Perfectly controlled changes to the object will be disallowed, and the only possibility of change is creating a new element with the changes performed. That amounts to time and space.
C++ provides plenty of ways you can mess up your code. I see const correctness as a first, efficient way of putting constraints to the code, in order to control the "side" effects (unwanted changes).
Marking a parameter as const (usually reference to const object or pointer to const object), will ensure that the passed object can't be changed, so you have no "side" effect of that function/method.
Marking a method as const will guarantee that that method will not change the state of the object it works with. If it does, the compiler will generate an error.
Marking a const data member will ensure that the data member can only be initialized in the initialization list of the constructor and can't be changed.
Also, smart compilers can use the constness as hints for various performance optimizations.
Of course, you can override constness. Constness is compile time checking. You can't break the rules by default, but you can use casting (e.g. const_cast) to make a const object non-const so it can be changed.
Just enlisting the compiler's help when writing code would be enough for me to advocate const-correctness. But today there is an additional advantage: multi-threading code is generally easier to write when you know where can our objects change and where they cannot.
I think it's a pity C# doesn't support const correctness as C++ does. 95% of all parameters I pass to a function are constant values, the const feature guarantees that the data you pass by reference won't be modified after the call. The "const" keyword provides compiler-checked documentation, which is a great help in large programs.

Going functional in C#

I know that in C# 3.0 you can do some functional programming magic with Linq and lambda expression and all that stuff. However, is it really possible to go completely "pure" functional in C#? By "pure" I mean having methods that are pure (always gives the same output for the same input) and completely free of side-effects. How do we get around the fact that we do not even have immutable integer type in C#?
If you want to program in a pure functional way, there is nothing stopping you.
On the other hand, if you have some program, there is no magic flag you can flip to force the program to behave in a pure functional way.
For ints (immutable)
If you use an int as a parameter, it is passed by value. Any changes are not propogated to the caller.
If you use an int declared in one method's scope in a closure within the method, than that int variable is shared. In this case, one must either pledge not to modify the int (programmer enforced), or simply not use an int in this way.
And if you truly need an immutable int, have you seen the readonly keyword?
Have you looked into F#? It seems much more along the lines of what you are talking about. C# just really isn't designed with functional programming in mind, and therefore won't really give you any of the benefits that are normally associated with a functional language.
Unfortunately no - C# is not a pure functional language and it does not intend to be. What has happened is that the C# team has seen that there are benefits to adding certain functionally-styled constructs and syntax to the language.
Functional purity is better found in other places (Lisp derivatives like Common Lisp and Scheme are good places to start).

Categories