How does const correctness help write better programs?

How does const correctness help write better programs? - c#

This question is from a C# guy asking the C++ people. (I know a fair bit of C but only have some general knowledge of C++).
Allot of C++ developers that come to C# say they miss const correctness, which seems rather strange to me. In .Net in order to disallow changing of things you need to create immutable objects or objects with read only properties/fields, or to pass copies of objects (by default structs are copied).
Is C++ const correctness, a mechanism to create immutable/readonly objects, or is there more to it? If the purpose is to create immutable/readonly objects, the same thing can be done in an environment like .Net.

A whole section is devoted to Const Correctness in the FAQ. Enjoy!

const correctness in C++ is at heart simply a way of saying exactly what you mean:
void foo( const sometype & t ) {
...
}
says "I will not change the object referred to by t, and if I try to, please Mr. Compiler warn me about it".
It is not a security feature or a way of (reliably) creating read-only objects, because it is trivial to subvert using C-style casts.

I bet this is a matter of mindsets. I'm working myself with long-time C++ people and have noticed they seem to think in C++ (of course they do!). It will take time for them to start seeing multiple answers to the ones they've grown familiar to have the 'stock' answers like const correctness. Give them time, and try to educate them gently.
Having said that, I do like the C++ 'const' way of guarantees. It's mainly promising the user of an interface that the object, data behind the pointer or whatever won't be messed with. I would see this as the #1 value of const correctness. #2 value is what it can allow the compiler in terms of optimization.
The problem is, of course, if the whole sw stack hasn't been built with const in mind from the ground up.
I'm sure C# has its own set of paradigms to do the same. This shows why it's so important to "learn one new (computing or natural) language a year". Simply to exercise one's mind of the other ways to see the world, and solve its problems.

With const-correctness, you can expose a mutable object as read-only without making a copy of it and wasting precious memory (especially relevant for collections). I know, there is this CPU-and-memory-are-cheap philosophy among desktop developers, but it still matters in embedded world.
On top of that, if you add complexities of memory ownership in C++, it is almost always better not to copy non-trivial objects that contain or reference other objects, hence there is a method of exposing existing objects as read-only.

Excellent Scott Meyer's book Effective C++ has dedicated an item to this issue:
ITEM 3: Use const whenever possible.
It really enlightens you :-)

The real answer, even if it is really hard to grasp if you have not used it, is that you loose expressiveness. The language has concepts that you cannot possibly express in C#, and they have meaning, they are part of the design, but lost in the translation to code.
This is not an answer, but rather an example:
Consider that you have a container that is sorted on a field of the elements that are stored. Those objects are really big. You need to offer access to the data for readers (consider showing the information in the UI).
Now, in C#/Java, you can go in one of two ways: either you make a copy for the caller to use (guarantees that the data will not change, but inefficient) or you return a reference to your internally held object (just hoping the caller will not change your data through setters).
If the user gets the reference and changes through it the field that serves as index, then your container invariants are broken.
In C++ you can return a constant reference/pointer, and the compiler will disable calling setter methods (mutating methods) on the instance you return. You get both the security that the user will not change (*) and efficiency in the call.
The third not mentioned before option is making the object inmutable, but that disables changes everywhere. Perfectly controlled changes to the object will be disallowed, and the only possibility of change is creating a new element with the changes performed. That amounts to time and space.

C++ provides plenty of ways you can mess up your code. I see const correctness as a first, efficient way of putting constraints to the code, in order to control the "side" effects (unwanted changes).
Marking a parameter as const (usually reference to const object or pointer to const object), will ensure that the passed object can't be changed, so you have no "side" effect of that function/method.
Marking a method as const will guarantee that that method will not change the state of the object it works with. If it does, the compiler will generate an error.
Marking a const data member will ensure that the data member can only be initialized in the initialization list of the constructor and can't be changed.
Also, smart compilers can use the constness as hints for various performance optimizations.
Of course, you can override constness. Constness is compile time checking. You can't break the rules by default, but you can use casting (e.g. const_cast) to make a const object non-const so it can be changed.

Just enlisting the compiler's help when writing code would be enough for me to advocate const-correctness. But today there is an additional advantage: multi-threading code is generally easier to write when you know where can our objects change and where they cannot.

I think it's a pity C# doesn't support const correctness as C++ does. 95% of all parameters I pass to a function are constant values, the const feature guarantees that the data you pass by reference won't be modified after the call. The "const" keyword provides compiler-checked documentation, which is a great help in large programs.

Related

Why would one ever use the "in" parameter modifier in C#?

So, I (think I) understand what the in parameter modifier does. But what it does appears to be quite redundant.
Usually, I'd think that the only reason to use a ref would be to modify the calling variable, which is explicitly forbidden by in. So passing by in reference seems logically equivalent to passing by value.
Is there some sort of performance advantage? It was my belief that on the back-end side of things, a ref parameter must at least copy the physical address of the variable, which should be the same size as any typical object reference.
So, then is the advantage just in larger structs, or is there some behind-the-scenes compiler optimization that makes it attractive elsewhere? If the latter, why shouldn't I make every parameter an in?

in was recently introduced to the C# language.
in is actually a ref readonly. Generally speaking, there is only one use case where in can be helpful: high performance apps dealing with lots of large readonly structs.
Assuming you have:
readonly struct VeryLarge
{
public readonly long Value1;
public readonly long Value2;
public long Compute() { }
// etc
}
and
void Process(in VeryLarge value) { }
In that case, the VeryLarge struct will be passed by-reference without creating of defensive copies when using this struct in the Process method (e.g. when calling value.Compute()), and the struct immutability is ensured by the compiler.
Note that passing a not-readonly struct with an in modifier will cause the compiler to create a defensive copy when calling struct's methods and accessing properties in the Process method above, which will negatively affect performance!
There is a really good MSDN blog entry which I recommend to carefully read.
If you would like to get some more historical background of in-introducing, you could read this discussion in the C# language's GitHub repository.
In general, most developers agree that introducing of in could be seen as a mistake. It's a rather exotic language feature and can only be useful in high-perf edge cases.

passing by in reference seems logically equivalent to passing by value.
Correct.
Is there some sort of performance advantage?
Yes.
It was my belief that on the back-end side of things, a ref parameter must at least copy the physical address of the variable, which should be the same size as any typical object reference.
There is not a requirement that a reference to an object and a reference to a variable both be the same size, and there is not a requirement that either is the size of a machine word, but yes, in practice both are 32 bits on 32 bit machines and 64 bits on 64 bit machines.
What you think the "physical address" has to do with it is unclear to me. On Windows we use virtual addresses, not physical addresses in user mode code. Under what possible circumstances would you imagine that a physical address is meaningful in a C# program, I am curious to know.
There is also not a requirement that a reference of any kind be implemented as the virtual address of the storage. References could be opaque handles into GC tables in a conforming implementation of the CLI specification.
is the advantage just in larger structs?
Decreasing the cost of passing larger structs is the motivating scenario for the feature.
Note that there is no guarantee that in makes any program actually faster, and it can make programs slower. All questions about performance must be answered by empirical research. There are very few optimizations that are always wins; this is not an "always win" optimization.
is there some behind-the-scenes compiler optimization that makes it attractive elsewhere?
The compiler and runtime are permitted to make any optimization they choose if doing so does not violate the rules of the C# specification. There is to my knowledge not such an optimization yet for in parameters, but that does not preclude such optimizations in the future.
why shouldn't I make every parameter an in?
Well, suppose you made an int parameter instead an in int parameter. What costs are imposed?
the call site now requires a variable rather than a value
the variable cannot be enregistered. The jitter's carefully-tuned register allocation scheme just got a wrench thrown into it.
the code at the call site is larger because it must take a ref to the variable and put that on the stack, whereas before it could simply push the value onto the call stack
larger code means that some short jump instructions may have now become long jump instructions, so again, the code is now larger. This has knock-on effects on all kinds of things. Caches get filled up sooner, the jitter has more work to do, the jitter may choose to not do certain optimizations on larger code sizes, and so on.
at the callee site, we've turned access to a value on the stack (or register) into an indirection into a pointer. Now, that pointer is highly likely to be in the cache, but still, we've now turned a one-instruction access to the value into a two-instruction access.
And so on.
Suppose it's a double and you change it to an in double. Again, now the variable cannot be enregistered into a high-performance floating point register. This not only has performance implications, it can also change program behaviour! C# is permitted to do float arithmetic in higher-than-64-bit precision and typically does so only if the floats can be enregistered.
This is not a free optimization. You have to measure its performance against the alternatives. Your best bet is to simply not make large structs in the first place, as the design guidelines suggest.

There is. When passing a struct, the in keyword allows an optimization where the compiler only needs to pass a pointer, without the risk of the method changing the content. The last is critical — it avoids a copy operation. On large structs this can make a world of difference.

This is done because of the functional programming approach. One of the major principle is that function should not have side effects, which means it should not change values of the parameters and should return some value. In C# there was no way to pass structs(and value type) without being copied only by reference which allows changing of the value. In swift there is a hacky algorithm which copies struct (their collections are structs BTW) as long as method starts changing its values. People who use swift not all aware of the copy stuff. This is nice c# feature since it's memory efficient and explicit. If you look at what's new you will see that more and more stuff is done around structs and arrays in stack. And in statement is just necessary for these features. There are limitations mentioned in the other answers, but is not that essential for understanding where .net is heading.

in it is readonly reference in c# 7.2
this means you do not pass entire object to function stack similar to ref case you pass only reference to structure
but attempt to change value of object gives compiler error.
And yes this will allow you to optimize code performance if you use big structures.

C# classes - Why so many static methods?

I'm pretty new to C# so bear with me.
One of the first things I noticed about C# is that many of the classes are static method heavy. For example...
Why is it:
Array.ForEach(arr, proc)
instead of:
arr.ForEach(proc)
And why is it:
Array.Sort(arr)
instead of:
arr.Sort()
Feel free to point me to some FAQ on the net. If a detailed answer is in some book somewhere, I'd welcome a pointer to that as well. I'm looking for the definitive answer on this, but your speculation is welcome.

Because those are utility classes. The class construction is just a way to group them together, considering there are no free functions in C#.

Assuming this answer is correct, instance methods require additional space in a "method table." Making array methods static may have been an early space-saving decision.
This, along with avoiding the this pointer check that Amitd references, could provide significant performance gains for something as ubiquitous as arrays.

Also see this rule from FXCOP
CA1822: Mark members as static
Rule Description
Members that do not access instance data or call instance methods can
be marked as static (Shared in Visual Basic). After you mark the
methods as static, the compiler will emit nonvirtual call sites to
these members. Emitting nonvirtual call sites will prevent a check at
runtime for each call that makes sure that the current object pointer
is non-null. This can achieve a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue.

Perceived functionality.
"Utility" functions are unlike much of the functionality OO is meant to target.
Think about the case with collections, I/O, math and just about all utility.
With OO you generally model your domain. None of those things really fit in your domain--it's not like you are coding and go "Oh, we need to order a new hashtable, ours is getting full". Utility stuff often just doesn't fit.
We get pretty close, but it's still not very OO to pass around collections (where is your business logic? where do you put the methods that manipulate your collection and that other little piece or two of data you are always passing around with it?)
Same with numbers and math. It's kind of tough to have Integer.sqrt() and Long.sqrt() and Float.sqrt()--it just doesn't make sense, nor does "new Math().sqrt()". There are a lot of areas it just doesn't model well. If you are looking for mathematical modeling then OO may not be your best bet. (I made a pretty complete "Complex" and "Matrix" class in Java and made them fairly OO, but making them really taught me some of the limits of OO and Java--I ended up "Using" the classes from Groovy mostly)
I've never seen anything anywhere NEAR as good as OO for modeling your business logic, being able to demonstrate the connections between code and managing your relationship between data and code though.
So we fall back on a different model when it makes more sense to do so.

The classic motivations against static:
Hard to test
Not thread-safe
Increases code size in memory
1) C# has several tools available that make testing static methods relatively easy. A comparison of C# mocking tools, some of which support static mocking: https://stackoverflow.com/questions/64242/rhino-mocks-typemock-moq-or-nmock-which-one-do-you-use-and-why
2) There are well-known, performant ways to do static object creation/logic without losing thread safety in C#. For example implementing the Singleton pattern with a static class in C# (you can jump to the fifth method if the inadequate options bore you): http://www.yoda.arachsys.com/csharp/singleton.html
3) As #K-ballo mentions, every method contributes to code size in memory in C#, rather than instance methods getting special treatment.
That said, the 2 specific examples you pointed out are just a problem of legacy code support for the static Array class before generics and some other code sugar was introduced back in C# 1.0 days, as #Inerdia said. I tried to answer assuming you had more code you were referring to, possibly including outside libraries.

The Array class isn't generic and can't be made fully generic because this would break backwards compatibility. There's some sort of magic going on where arrays implement IList<T>, but that's only for single-dimension arrays with a lower bound of 0 – "list-ish" arrays.
I'm guessing the static methods are the only way to add generic methods that work over any shape of array regardless of whether it qualifies for the above-mentioned compiler magic.

Shoud I use LayoutKind.Auto for my structs if they don't perform in COM Interop?

By default structs in C# are implemented with [StructLayout( LayoutKind.Sequential )] for reasons basically stating that these type of objects are commonly used for COM Interop and their fields must stay in the order they were defined. Classes have LayoutKind.Auto defined.
My question is should I explicitly state my structs as [StructLayout( LayoutKind.Auto )] and would this give me any benefits over the default? I mean that if structs are initialized on stack, will it make any difference - i.e. the GC doesn't have to move them around? Also will it help when structs are initialized on the heap - i.e. are part of some class?

The only possible benefit I can think of is your struct taking up less memory. But if you have such a large struct in the first place you should probably refactor it into a class.
A potential problem is it you want to Marshall your struct into a byte[] using Marshal.PtrToStructure, how can you guarantee the order of the bytes will be as you expect?
Doing this just seems like you're introducing more possible problems than those you're solving... That being said if the order of the fields in never important to you, then do it, but bear in mind that the next person who comes alone might not be expecting it.

It may give you benefits, even though I don't suppose that it will do much. I usually stick to the defaults.
Basically, with an auto layout, the CLR can choose how to align data, therefore maybe making some space tradeoffs for speed (this will depend on the platform also, keeping things aligned can be more important in some than in others). However, since structs are also often used on the stack or as composite helper structures (think of KeyValuePair), sequential does usually make sense as default.

How does one detect mutation in a C# function?

I've been reading articles on how to program in a functional (i.e F#) style in C#, for example, foregoing loops for recursion and always returning a copy of a value/object instead of returning the same variable with a new state.
For example, what sort of code inspection things should I watch out for? Is there any way to tell if a method on a BCL class causes a mutation?

The tool NDepend can tell you where you have side effect. It can also ensure automatically that a class is immutable (i.e no side effect on its object instances) or a method is pure (i.e no side effects during the execution of the method. Disclaimer: I am one of the developers of the tool.
In short, trick is to define an attribute, such as, MyNamespace.ImmutableAttribute
and to tag classes that you wish to be immutable.
[Immutable]class MyImmutableClass {...}
If the class is not immutable, or more likely, if one day a developer modifies it and breaks its immutability, then the following Code Rule over LINQ Query (CQLinq) will suddenly warn:
warnif count > 0
from t in Application.Types
where !t.IsImmutable && t.HasAttribute("MyNamespace.ImmutableAttribute")
select t
On a side note, I wrote an article on immutability/purity/side-effects and NDepend usage:
Immutable Types: Understand Them And Use Them

Here are two things that would help you find variables and fields whose values are getting changed. Mutability is more complex than this, of course (for example these won't find calls to add to collections) but depending on what you're looking for, they may be helpful.
Make all of your fields readonly; then they can only be set from the constructor, and not changed thereafter.
Pick up a copy of ReSharper. It expands on Visual Studio's syntax highlighting, and has an option to set up custom highlighting for mutable local variables. This will let you see at a glance whether locals are being modified.

Unfortunately there's no easy way to do that in C# currently. If you're lucky the documentation will tell you, but generally that is not the case.
Inspecting the code with Reflector (assuming we're talking managed code) can reveal if the current implementation has any side effects, but since this is an implementation detail there's no guarantee that it will not change in the future, so basically you will have to repeat the verification every time you update the code in question.
Tools such as NDepend can help you find out dependencies between types - i.e. where you have too look for side effects.
For your own types, you can implement immutability, by making sure instances never leak references to internals. Make sure to copy the contents of reference type instances used to instantiate the objects as other may otherwise keep references to internal state.

Without testing the method, you won't be able to tell if it has any side effects. It would be handy if the documentation mentioned any side effects of a method or function but unfortunately it doesn't.
Keep in mind that you will have to do some extensive testing to be certain that no side effects can occur. If you want to you can always disassemble the assembly and read the code for side effects.

Why was constness removed from Java and C#?

I know this has been discussed many times, but I am not sure I really understand why Java and C# designers chose to omit this feature from these languages. I am not interested in how I can make workarounds (using interfaces, cloning, or any other alternative), but rather in the rationale behind the decision.
From a language design perspective, why has this feature been declined?
P.S: I'm using words such as "omitted", which some people may find inadequate, as C# was designed in an additive (rather than subtractive) approach. However, I am using such words because the feature existed in C++ before these languages were designed, so it is omitted in the sense of being removed from a programmer's toolbox.

In this interview, Anders said:
Anders Hejlsberg: Yes. With respect to
const, it's interesting, because we
hear that complaint all the time too:
"Why don't you have const?" Implicit
in the question is, "Why don't you
have const that is enforced by the
runtime?" That's really what people
are asking, although they don't come
out and say it that way.
The reason that const works in C++ is
because you can cast it away. If you
couldn't cast it away, then your world
would suck. If you declare a method
that takes a const Bla, you could pass
it a non-const Bla. But if it's the
other way around you can't. If you
declare a method that takes a
non-const Bla, you can't pass it a
const Bla. So now you're stuck. So you
gradually need a const version of
everything that isn't const, and you
end up with a shadow world. In C++ you
get away with it, because as with
anything in C++ it is purely optional
whether you want this check or not.
You can just whack the constness away
if you don't like it.

I guess primarily because:
it can't properly be enforced, even in C++ (you can cast it)
a single const at the bottom can force a whole chain of const in the call tree
Both can be problematic. But especially the first: if it can't be guaranteed, what use is it? Better options might be:
immutable types (either full immutability, or popsicle immutability)

As to why they did it those involved have said so:
http://blogs.msdn.com/ericgu/archive/2004/04/22/118238.aspx
http://blogs.msdn.com/slippman/archive/2004/01/22/61712.aspx
also mentioned by Raymond Chen
http://blogs.msdn.com/oldnewthing/archive/2004/04/27/121049.aspx
In a multi language system this would have been very complex.

As for Java, how would you have such a property behave? There are already techniques for making objects immutable, which is arguably a better way to achieve this with additional benefits. In fact you can emulate const behaviour by declaring a superclass/superinterface that implements only the methods that don't change state, and then having a subclass/subinterface that implements the mutating methods. By upcasting your mutable class to an instance of class with no write methods, other bits of code cannot modify the object without specifically casting it back to the mutable version (which is equivalent to casting away const).
Even if you don't want the object to be strictly immutable, if you really wanted (which I wouldn't recommend) you could put some kind of 'lock' mode on it so that it could only be mutated when unlocked. Have the lock/unlock methods be private, or protected as appropriate, and you get some level of access control there. Alternatively, if you don't intend for the method taking it as a parameter to modify it at all, pass in a copy of that object, or if copying the entire object is too heavyweight then some other lightweight data object that contains just the necessary information. You could even use dynamic proxies to create a proxy to your object that turn any calls to mutation methods into no-ops.
Basically there are already a whole bunch of ways to prevent a class being mutated, which let you choose one that fits most appropriately into your situation (hint: choose pure immutability wherever possible as it makes the object trivially threadsafe and easier to reason with in general). There are no obvious semantics for how const could be implemented that would be an improvement on these techniques, it would be another thing to learn that would either lack flexibility, or be so flexible as to be useless.
That is, unless I've missed something, which is entirely possible. :-)

Java have its own version of const; final. Joshua Bloch describes in his Effective Java
how you effectively use the final keyword. (btw, const is a reserved keyword in Java, for future discrepancies)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.