How long does variable assignment take? - c#

I've frequently come across code that made liberal use of variables like var self = this; so their code would look nicer. While I don't think assignments like these are going to be significant at all in any piece of code, I've always wondered how long an assignment like above would take.
With that said: How long does it take, assuming it isn't optimized away? How do the times compare between different languages- e.g. C#, Java, and C++? Common value types (including pointers)? 32 / 64-bit architectures?
EDIT: Erased the part about "noticeable difference". I meant that part as a side-question, but many people have seen that and started downvoting me for premature optimization (in spite of me highlighting the bottleneck part in bold).

The code:
var self = this;
Isn't creating a new instance of object 'this', but it is referencing a pointer to the object 'this'. At the machine level there is only one pointer since the C# compiler optimizes these types of reference out. So the "how long it takes" is actually zero.
So, why do 'this'? Because it makes code easier to read.

Related

Is it possible to reassign a ref local?

C#'s ref locals are implemented using a CLR feature called managed pointers, that come with their own set of restrictions, but luckily being immutable is not one of them. I.e. in ILAsm if you have a local variable of managed pointer type, it's entirely possible to change this pointer, making it "reference" another location. (C++/CLI also exposes this feature as interior pointers.)
Reading the C# documentation on ref locals it appears to me that C#'s ref locals are, even though based on the managed pointers of CLR, not relocatable; if they are initialized to point to some variable, they cannot be made to point to something else. I've tried using
ref object reference = ref some_var;
ref reference = ref other_var;
and similar constructs, to no avail.
I've even tried to write a small struct wrapping a managed pointer in IL, it works as far as C# is concerned, but the CLR doesn't seem to like having a managed pointer in a struct, even if in my usage it doesn't ever go to the heap.
Does one really have to resort to using IL or tricks with recursion to overcome this? (I'm implementing a data structure that needs to keep track of which of its pointers were followed, a perfect use of managed pointers.)
[edit:] "ref-reassign" is on the schedule for C# 7.3. The 'conditional-ref' workaround, which I discuss below, was deployed in C# 7.2.
I've also long been frustrated by this and just recently stumbled on a workable answer.
Essentially, in C# 7.2 you can now initialize ref locals with a ternary operator, and this can be con­triv­ed. somewhat torturously, into a simulation of ref-local reassignment. You "hand off" the ref local assignments downwards through multiple variables, as you move down in the lexical scope of your C# code.
This approach requires a great deal of unconventional thinking and a lot of planning ahead. For certain situations or coding scenarios, it may not be possible to anticipate the gamut of runtime con­figurations such that any conditional assignment scheme might apply. In this case you're out of luck. Or, switch to C++/CLI, which exposes managed tracking references. The tension here is that, for C#, the vast and indisputable gains in concision, elegance, and efficiency which are immediately realized by introducing the conventional use of managed pointers (these points are discussed fur­ther below) is frittered away with the degree of contortion required to overcome the reassignment problem.
The syntax that had eluded me for so long is shown next. Or, check the link I cited at the top.
C# 7.2 ref-local conditional assignment via ternary oerator ? :
ref int i_node = ref (f ? ref m_head : ref node.next);
This line is from a canonical problem case for the ref local dilemma that the questioner posed here. It's from code which maintains back-pointers while walking a singly-linked list. The task is trivial in C/C++, as it should be (and is quite beloved by CSE101 instructors, perhaps for that par­ticular reason)—but is entirely agonizing using managed pointers C#.
Such a complaint is entirely legitimate too, thanks to Microsoft's own C++/CLI language showing us how awesome managed pointers can be in the .NET universe. Instead, most C# developers seem to just end up using integer indices into arrays, or of course full blown native pointers with unsafe C#.
Some brief comments on the linked-list walking example, and why one would be interested in going to so much trouble over these managed pointers. We assume all of the nodes are actually structs in an array (ValueType, in-situ) such as m_nodes = new Node[100]; and each next pointer is thus an integer (its index in the array).
struct Node
{
public int ix, next;
public char data;
public override String ToString() =>
String.Format("{0} next: {1,2} data: {2}", ix, next, data);
};
As shown here, the head of the list will be a standalone integer, stored apart from the records. In the next snippet, I use the new C#7 syntax for ValueTuple to do so. Obviously it's no problem to traverse forward using these integer links—but C# has traditionally lacked an elegant way to main­tain a link to the node you came from. It's a problem since one of the integers (the first one) is a special case owing to not being embedded in a Node structure.
static (int head, Node[] nodes) L =
(3,
new[]
{
new Node { ix = 0, next = -1, data = 'E' },
new Node { ix = 1, next = 4, data = 'B' },
new Node { ix = 2, next = 0, data = 'D' },
new Node { ix = 3, next = 1, data = 'A' },
new Node { ix = 4, next = 2, data = 'C' },
});
Additionally, there's presumably a decent amount of processing work to do on each node, but you really don't want to pay the (double) performance costs of imaging each (possibly large) ValueType out of its cozy array home—and then having to image each one back when you're done! After all, surely the reason we're using value types here is to maximize performance. As I discuss at length elsewhere on this site, structs can be extremely efficient in .NET, but only if you never accident­ally "lift" them out of their storage. It's easy to do and it can immediately destroy your memory bus bandwidth.
The trival approach to not-lifting the structs just repeats array indexing like so:
int ix = 1234;
arr[ix].a++;
arr[ix].b ^= arr[ix].c;
arr[ix].d /= (arr[lx].e + arr[ix].f);
Here, each ValueType field access is independently dereferenced on every access. Although this "optimization" does avoid the bandwidth penalties mentioned above, repeating the same array indexing operation over and over again can instead implicate an entirely different set of runtime penalties. The (opportunity) costs now are due to unnecessarily wasted cycles where .NET re­computes provably invariant physical offsets or performs redundant bounds checks on the array.
JIT optimizations in release-mode may mitigate these issues somewhat—or even dramatically—by recognizing and consolidating redundancy in the code you supplied, but maybe not as much as you'd think or hope (or eventually realize you don't want): JIT optimizations are strongly constrained by strict adherence to the .NET Memory Model.[1], which requires that whenever a storage location is publicly visible, the CPU must execute the relevant fetch sequence exactly as authored in the code. For the previous example, this means that if ix is shared with other threads in any way prior to the operations on arr, then the JIT must ensure that the CPU actually touches the ix storage location exactly 6 times, no more, no less.
Of course the JIT can do nothing to address the other obvious and widely-acknowledged problem with repetitive source code such as the previous example. In short, it's ugly, bug-prone, and harder to read and maintain. To illustrate this point, ☞ ...did you even notice the bug I intentionally put in the preceding code?
The cleaner version of the code shown next doesn't make bugs like this "easier to spot;" in­stead, as a class, it precludes them en­tirely, since there's now no need for an array-in­dexing variable at all. Variable ix doesn't need exist in the following, since 1234 is used only once. It follows that the bug I so deviously introduced earlier cannot be propagated to this example because it has no means of expression, the benefit being that what can't exist can't introduce a bug (as opposed to 'what does not exist...', which most certainly could be a bug)
ref Node rec = ref arr[1234];
rec.a++;
rec.b ^= rec.c;
rec.d /= (rec.e + rec.f);
Nobody would disagree that this is an improvement. So ideally we want to use managed pointers to directly read and write fields in the structure in situ. One way to do this is to write all of your in­ten­sive processing code as instance member functions and properties in the ValueType itself, though for some reason it seems that many people don't like this approach. In any case, the point is now moot with C#7 ref locals...
✹ ✹ ✹
I'm now realizing that fully explaining the type of programming required here is probably too in­volved to show with a toy example and thus beyond the scope of a StackOverflow article. So I'm going to jump ahead and in order to wrap up I'll drop in a section of some working code I have showing simulated managed pointer reassignment. This is taken from a heavily modified snap­shot of HashSet<T> in the .NET 4.7.1 reference source[direct link], and I'll just show my version without much explanation:
int v1 = m_freeList;
for (int w = 0; v1 != -1; w++)
{
ref int v2 = ref (w == 0 ? ref m_freeList : ref m_slots[v1].next);
ref Slot fs = ref m_slots[v2];
if (v2 >= i)
{
v2 = fs.next;
fs = default(Slot);
v1 = v2;
}
else
v1 = fs.next;
}
This is just an arbitrary sample fragment from the working code so I don't expect anyone to follow it, but the gist of it is that the 'ref' variables, designated v1 and v2, are intertwined across scope blocks and the ternary operator is used to coordinate how they flow down. For example, the only purpose of the loop variable w is to handle which variable gets activated for the special case at the start of the linked-list traversal (discussed earlier).
Again, it turns out to be a very bizarre and tortured constraint on the normal ease and fluidity of modern C#. Patience, determination, and—as I mentioned earlier—a lot of planning ahead is required.
&lsqb;1.]If you're not familiar with what's called the .NET Memory Model, I strongly suggest taking a look. I believe .NET's strength in this area is one of its most compelling features, a hidden gem and the one (not-so-)secret superpower that most fatefully em­barrasses those ever-strident friends of ours who yet adhere to the 1980's-era ethos of bare-metal coding. Note an epic irony: imposing strict limits on wild or unbounded aggression of compiler optimization may end up enabling apps with much better performance, because stronger constraints expose re­liable guarantees to developers. These, in turn imply stronger programming abstractions or suggest advanced design paradigms, in this case relevant to concurrent systems.For example, if one agrees that, in the native community, lock-free programming has languished in the margins for decades, perhaps the unruly mob of optimizing compilers is to blame? Progress in this specialty area is easily wrecked without the reliable determinism and consistency provided by a rigorous and well-defined memory model, which, as noted, is somewhat at odds with unfettered compiler optimization. So here, constraints mean that the field can at last innovate and grow. This has been my experience in .NET, where lock-free programming has become a viable, realistic—and eventually, mundane—basic daily programming vehicle.

What's the practical difference between a variable being mutable vs non-mutable

I'm just learning C# and working with some examples of strings and StringBuilder. From my reading, I understand that if I do this:
string greeting = "Hello";
greeting += " my good friends";
that I get a new string called greeting with the concatenated value. I understand that the run-time(or compiler, or whatever) is actually getting rid of the reference to the original string greeting and replacing it with a new concatenated one of the same name.
I was just wondering what practical application/ramification this has. Why does it matter to me how C# shuffles strings around in the background when the effect to me is simply that my initial variable changed value.
I was wondering if someone could give me a scenario where a programmer would need to know the difference. * a simple example would be nice, as I'm a relative beginner to this.
Thanks in advance..
Strings, again, are a good example. A very common error is:
string greeting = "Hello Foo!";
greeting.Replace("Foo", "World");
Instead of the proper:
string greeting = "Hello Foo!";
greeting = greeting.Replace("Foo", "World");
Unless you knew that string was an immutable class, you could suspect the first method would be appropriate.
Why does it matter to me how C# shuffles strings around in the background when the effect to me is simply that my initial variable changed value.
The other major place where this has huge advantages is when concurrency is introduced. Immutable types are much easier to deal with in a concurrent situation, as you don't have to worry about whether another thread is modifying the same value within the same reference. Using an immutable type often allows you to avoid the potentially significant cost of synchronization (ie: locking).
I understand that the run-time(or compiler, or whatever) is actually getting rid of the reference to the original string greeting and replacing it with a new concatenated one of the same name.
Pedantic intro: No. Objects do not have names -- variables do. It is storing a new object in the same variable. Thus, the name (variable) used to access the object is the same, even though it (the variable) now refers to another object. An object may also be stored in multiple variables and have multiple "names" at the same time or it might not be accessible directly by any variable.
The other parts of the question have already been succinctly answered for the case of strings -- however, the mutable/immutable ramifications are much larger. Here are some questions which may widen the scope of the issue in context.
What happens if you set a property of an object passed into a method? (There are these pesky "value-types" in C#, so it depends...)
What happens if a sequence of actions leaves an object in an inconsistent state? (E.g. property A was set and an error occurred before property B was set?)
What happens if multiple parts of code expect to be modifying the same object, but are not because the object was cloned/duplicated somewhere?
What happens if multiple parts of code do not expect the object to be modified elsewhere, but it is? (This applies in both threading and non-threading situations)
In general, the contract of an object (API and usage patterns/scope/limitations) must be known and correctly adhered to in order to ensure program validity. I generally find that immutable objects make life easier (as then only one of the above "issues" -- a meager 25% -- even applies).
Happy coding.
C# isn't doing any "shuffling", you are! Your statement assigns a new value to the variable, the referenced object itself did not change, you just dropped the reference.
The major reason immutability is useful is this:
String greeting = "Hello";
// who knows what foo does
foo(greeting);
// always prints "Hello" since String is immutable
System.Console.WriteLine(greeting);
You can share references to immutable objects without worrying about other code changing the object--it can't happen. Therefore immutable objects are easier to reason about.
Most of the time, very little effect. However, in the situation of concatenating many strings, the performance hit of garbage collecting all those strings becomes problematic. Do too many string manipulations with just a string, and the performance of your application can take a nosedive.
This is the reason why StringBuilder is more effective when you have a lot of string manipulation to do; leaving all those 'orphaned' strings out there makes a bigger problem for the Garbage Collector than simply modifying an in memory buffer.
I think the main benefit of immutable strings lies in make memory management easier.
C# allocates memory byte by byte for each object. If you create a string "Tom" it takes up three bytes. You may then allocate an integer and that would be four bytes. If you then tried to change the string "Tom" to "Tomas" it would require moving all the other memory to make room for the two new characters a and s.
To eliminate this pain, it's easier (and quicker) to just allocate five new bytes for the string "Tomas".
Does that help?
In performance terms, the advantage of immutuable is copying an object is cheap in terms of both CPU and memory since it only involves making a copy of a pointer. The downside is that writing to the object becomes more expensive since it must make a copy of the object in the process.

Why is String.Concat not optimized to StringBuilder.Append?

I found concatenations of constant string expressions are optimized by the compiler into one string.
Now with string concatenation of strings only known at run-time, why does the compiler not optimize string concatenation in loops and concatenations of say more than 10 strings to use StringBuilder.Append instead? I mean, it's possible, right? Instantiate a StringBuilder and take each concatenation and turn it into an Append() call.
Is there any reason why this should or could not be optimized? What am I missing?
The definite answer will have to come from the compiler design team. But let me take a stab here...
If your question is, why the compiler doesn't turn this:
string s = "";
for( int i = 0; i < 100; i ++ )
s = string.Concat( s, i.ToString() );
into this:
StringBuilder sb = new StringBuilder();
for( int i = 0; i < 100; i++ )
sb.Append( i.ToString() );
string s = sb.ToString();
The most likely answer is that this is not an optimization. This is a rewrite of the code that introduces new constructs based on knowledge and intent that the developer has - not the compiler.
This type of change would require the compiler to have more knowledge of the BCL than is appropriate. What if tomorrow, some more optimal string assembly service becomes available? Should the compiler use that?
What if your loop conditions were more complicated, should the compiler attempt to perform some static analysis to decide whether the result of such a rewrite would still be functionally equivalent? In many ways, this would be like solving the halting problem.
Finally, I'm not sure that in all cases this would result in faster performing code. There is a cost to instantiating a StringBuilder and resizing its internal buffer as text is appended. In fact, the cost of appending is strongly tied to the size of the string being concatenated, how many there are, what memory pressure looks like. These are things that the compiler cannot predict in advance.
It's your job as a developer to write well-performing code. The compiler can only help by making certain safe, invariant-preserving optimizations. Not rewriting your code for you.
LBuskin's answer is excellent; I have just a couple of things to add.
First, JScript.NET does do this optimization. JScript is frequently used by less-experienced programmers for tasks that involve construction of large strings in loops, like building up JSON objects, HTML data, and so on.
Since those programmers might not be aware of the n-squared cost of naive string allocation, might not be aware of the existence of string builders, and frequently write code using this pattern, we felt that it was reasonable to put this optimization into JScript.NET.
C# programmers tend to be more aware of the underlying costs of the code they write and more aware of the existence of off-the-shelf parts like StringBuilder, so they need this optimization less. And more fundamentally, the design philosophy of C# is that it is a "do what I said" language with a minimum of "magic"; JScript is a "do what I mean" language that does its best to figure out how to best serve you, even if that means sometimes guessing wrong. Both philosophies are valid and useful.
Sometimes it does "go the other way". Compare this choice to the choice we make for switches on strings. Switches on strings are actually compiled as a creation of a dictionary containing the strings, rather than as a series of string comparisons. That optimization could be bad; it might be faster to simply do the string comparisons. But here we make a guess that you "meant" the switch to be a table lookup rather than a series of "if" statements -- if you'd meant the series of if statements, you could easily write that yourself.
For a single concatenation of multiple strings (e.g. a + b + c + d + e + f + g + h + i + j) you really want to be using String.Concat IMO. It has the overhead of building an array for each call, but it has the benefit that the method can work out the exact length of the resulting string before it needs to allocate any memory. StringBuilder.Append(a).Append(b)... only gives a single value at a time, so the builder doesn't know how much memory to allocate.
As for doing it in loops - at that point you've added a new local variable, and you've got to add code to write back to the string variable at exactly the right time (calling StringBuilder.ToString()). What happens when you're running in the debugger? Wouldn't it be pretty confusing not to see the value building up, only becoming visible at the end of the loop? Oh, and of course you've got to perform appropriate validation that the value isn't used at any point before the end of the loop...
Two reasons:
You can't programmatically identify places where it would be strictly higher performing.
The "optimization" will slow things down if performed incorrectly.
You can suggest people use the correct calls for their application, but at some point it's the developer's responsibility to get it right.
Edit: Regarding the cutoff, we have another couple of problems:
The only way to know for sure that the cutoff is reached is complicated flow analysis. The number of places where this would be able to find sections that could be converted is extremely small.
Flow analysis is expensive. If you do it at runtime, the whole program will run slower for the rare chance that one piece of poorly written code will be faster. If you do it at compile time, it's not an error according to language syntax but you can issue a warning - and that's exactly what FXCop does (a slow but available flow analysis tool). Just think if FXCop always had to run with the compiler; so many hours people would be just waiting to run code. And if it was at runtime, well welcome to JVM startup times...
Because it's the compiler's job to generate semantically-correct code. Changing invocations of String.Concat to invocations of StringBuilder.Append would be changing the semantics of the code.
I believe it would be a little too complex for the compiler writers. And when you are referencing the intermediate strings inside the loops besides the concatenation (for example passing them to some other methods or so), this optimization would not be possible.
Probably because it's complicated to match such a pattern in the code, and in case the compiler can't do the match for some reason, the performance of the code is suddenly terrible. Optimising code like that would encourage writing code like that, which would even further increase the negative impact in the cases where the compiler can no longer do the optimisation.
For concatenating a known set of strings, StringBuilder is not faster than String.Concat.
A String is an immutable type, hence using concatenating the string is slower than using StringBuilder.Append.
Edit: To clarify my point a bit more, when you talk about why is String.Concat not optimized to StringBuilder.Append, a StringBuilder class has completely different semantics to the immutable type of String. Why should you expect the compiler to optimize that as they are clearly two different things? Furthermore, a StringBuilder is a mutable type that can change its length dynamically, why should a compiler optimize an immutable type to a mutable type? That is the design and semantics ingrained into the ECMA spec for the .NET Framework, regardless of the language.
It's a bit like asking the compiler (and perhaps expecting too much) to compile a char and optimize it into a int because the int works on 32 bits instead of 8 bits and would be deemed faster!

C# to C++ 'Gotchas'

I have been developing a project that I absolutely must develop part-way in C++. I need develop a wrapper and expose some C++ functionality into my C# app. I have been a C# engineer since the near-beginning of .NET, and have had very little experience in C++. It still looks very foreign to me when attempting to understand the syntax.
Is there anything that is going to knock me off my feet that would prevent me from just picking up C++ and going for it?
C++ has so many gotchas that I can't enumerate them all. Do a search for "C# vs C++". A few basic things to know:
In C++:
struct and a class are basically the same thing (Default visibility for a struct is public, it's private for a class).
Both struct and class can be created either on the heap or the stack.
You have to manage the heap yourself. If you create something with "new", you have to delete it manually at some point.
If performance isn't an issue and you have very little data to move around, you can avoid the memory management issue by having everything on the stack and using references (& operator).
Learn to deal with .h and .cpp. Unresolved external can be you worse nightmare.
You shouldn't call a virtual method from a constructor. The compiler will never tell you so I do.
Switch case doesn't enforce "break" and go thru by default.
There is not such a thing as an interface. Instead, you have class with pure virtual methods.
C++ aficionados are dangerous people living in cave and surviving on the fresh blood of C#/java programmers. Talk with them about their favorite language carefully.
Garbage collection!
Remember that everytime you new an object, you must be responsible for calling delete.
There are a lot of differences, but the biggest one I can think of that programmers coming from Java/C# always get wrong, and which they never realize they've got wrong, is C++'s value semantics.
In C#, you're used to using new any time you wish to create an object. And whenever we talk about a class instance, we really mean "a reference to the class instance". Foo x = y doesn't copy the object y, it simply creates another reference to whatever object y references.
In C++, there's a clear distinction between local objects, allocated without new (Foo f or Foo f(x, y), and dynamically allocated ones (Foo* f = new Foo() or Foo* f = new Foo(x, y)). And in C# terms, everything is a value type. Foo x = y actually creates a copy of the Foo object itself.
If you want reference semantics, you can use pointers or references: Foo& x = y creates a reference to the object y. Foo* x = &y creates a pointer to the address at which y is located. And copying a pointer does just that: it creates another pointer, which points to whatever the original pointer pointed to. So this is similar to C#'s reference semantics.
Local objects have automatic storage duration -- that is, a local object is automatically destroyed when it goes out of scope. If it is a class member, then it is destroyed when the owning object is destroyed. If it is a local variable inside a function, it is destroyed when execution leaves the scope in which it was declared.
Dynamically allocated objects are not destroyed until you call delete.
So far, you're probably with me. Newcomers to C++ are taught this pretty soon.
The tricky part is in what this means, how it affects your programming style:
In C++, the default should be to create local objects. Don't allocate with new unless you absolutely have to.
If you do need dynamically allocated data, make it the responsibility of a class. A (very) simplified example:
class IntArrayWrapper {
explicit IntArrayWrapper(int size) : arr(new int[size]) {} // allocate memory in the constructor, and set arr to point to it
~IntArrayWrapper() {delete[] arr; } // deallocate memory in the destructor
int* arr; // hold the pointer to the dynamically allocated array
};
this class can now be created as a local variable, and it will internally do the necessary dynamic allocations. And when it goes out of scope, it'll automatically delete the allocated array again.
So say we needed an array of x integers, instead of doing this:
void foo(int x){
int* arr = new int[x];
... use the array ...
delete[] arr; // if the middle of the function throws an exception, delete will never be called, so technically, we should add a try/catch as well, and also call delete there. Messy and error-prone.
}
you can do this:
void foo(int x){
IntArrayWrapper arr(x);
... use the array ...
// no delete necessary
}
Of course, this use of local variables instead of pointers or references means that objects are copied around quite a bit:
Bar Foo(){
Bar bar;
... do something with bar ...
return bar;
}
in the above, what we return is a copy of the bar object. We could return a pointer or a reference, but as the instance created inside the function goes out of scope and is destroyed the moment the function returns, we couldn't point to that. We could use new to allocate an instance that outlives the function, and return a function to that -- and then we get all the memory management headaches of figuring out whose responsibility it is to delete the object, and when that should happen. That's not a good idea.
Instead, the Bar class should simply be designed so that copying it does what we need. Perhaps it should internally call new to allocate an object that can live as long as we need it to. We could then make copying or assignment "steal" that pointer. Or we could implement some kind of reference-counting scheme where copying the object simply increments a reference counter and copies the pointer -- which should then be deleted not when the individual object is destroyed, but when the last object is destroyed and the reference counter reaches 0.
But often, we can just perform a deep copy, and clone the object in its entirety. If the object includes dynamically allocated memory, we allocate more memory for the copy.
It may sound expensive, but the C++ compiler is good at eliminating unnecessary copies (and is in fact in most cases allowed to eliminate copy operations even if they have side effects).
If you want to avoid copying even more, and you're prepared to put up with a little more clunky usage, you can enable "move semantics" in your classes as well as (or instead of) "copy semantics". It's worth getting into this habit because (a) some objects can't easily be copied, but they can be moved (e.g. a Socket class), (b) it's a pattern established in the standard library and (c) it's getting language support in the next version.
With move semantics, you can use objects as a kind of "transferable" container. It's the contents that move. In the current approach, it's done by calling swap, which swaps the contents of two objects of the same type. When an object goes out of scope, it is destructed, but if you swap its contents into a reference parameter first, the contents escape being destroyed when the scope ends. Therefore, you don't necessarily need to go all the way and use reference counted smart pointers just to allow complex objects to be returned from functions. The clunkiness comes from the fact that you can't really return them - you have to swap them into a reference parameter (somewhat similar to a ref parameter in C#). But the language support in the next version of C++ will address that.
So the biggest C# to C++ gotcha I can think of: don't make pointers the default. Use value semantics, and instead tailor your classes to behave the way you want when they're copied, created and destroyed.
A few months ago, I attempted to write a series of blog posts for people in your situation:
Part 1
Part 2
Part 3
I'm not 100% happy with how they turned out, but you may still find them useful.
And when you feel that you're never going to get a grip on pointers, this post may help.
No run-time checks
One C++ pitfall is the behaviour when you try to do something that might be invalid, but which can only be checked at runtime - for example, dereferencing a pointer that could be null, or accessing an array with an index that might be out of range.
The C# philosophy emphasises correctness; all behaviour should be well-defined and, in cases like this, it performs a run-time check of the preconditions and throws well-defined exceptions if they fail.
The C++ philosophy emphasises efficiency, and the idea that you shouldn't pay for anything you might not need. In cases like this, nothing will be checked for you, so you must either check the preconditions yourself or design your logic so that they must be true. Otherwise, the code will have undefined behaviour, which means it might (more or less) do what you want, it might crash, or it might corrupt completely unrelated data and cause errors that are horrendously difficult to track down.
Just to throw in some others that haven't been mentioned yet by other answers:
const: C# has a limited idea of const. In C++ 'const-correctness' is important. Methods that don't modify their reference parameters should take const-references, eg.
void func(const MyClass& x)
{
// x cannot be modified, and you can't call non-const methods on x
}
Member functions that don't modify the object should be marked const, ie.
int MyClass::GetSomething() const // <-- here
{
// Doesn't modify the instance of the class
return some_member;
}
This might seem unnecessary, but is actually very useful (see the next point on temporaries), and sometimes required, since libraries like the STL are fully const-correct, and you can't cast const things to non-const things (don't use const_cast! Ever!). It's also useful for callers to know something won't be changed. It is best to think about it in this way: if you omit const, you are saying the object will be modified.
Temporary objects: As another answer mentioned, C++ is much more about value-semantics. Temporary objects can be created and destroyed in expressions, for example:
std::string str = std::string("hello") + " world" + "!";
Here, the first + creates a temporary string with "hello world". The second + combines the temporary with "!", giving a temporary containing "hello world!", which is then copied to str. After the statement is complete, the temporaries are immediately destroyed. To further complicate things, C++0x adds rvalue references to solve this, but that's way out of the scope of this answer!
You can also bind temporary objects to const references (another useful part of const). Consider the previous function again:
void func(const MyClass& x)
This can be called explicitly with a temporary MyClass:
func(MyClass()); // create temporary MyClass - NOT the same as 'new MyClass()'!
A MyClass instance is created, on the stack, func2 accesses it, and then the temporary MyClass is destroyed automatically after func returns. This is convenient and also usually very fast, since the heap is not involved. Note 'new' returns a pointer - not a reference - and requires a corresponding 'delete'. You can also directly assign temporaries to const references:
const int& blah = 5; // 5 is a temporary
const MyClass& myClass = MyClass(); // creating temporary MyClass instance
// The temporary MyClass is destroyed when the const reference goes out of scope
Const references and temporaries are frequent in good C++ style, and the way these work is very different to C#.
RAII, exception safety, and deterministic destructors. This is actually a useful feature of C++, possibly even an advantage over C#, and it's worth reading up on since it's also good C++ style. I won't cover it here.
Finally, I'll just throw in this is a pointer, not a reference :)
The traditional stumbling blocks for people coming to C++ from C# or Java are memory management and polymorphic behavior:
While objects always live on the heap and are garbage collected in C#/Java, you can have objects in static storage, stack or the heap ('free store' in standard speak) in C++. You have to cleanup the stuff you allocate from the heap (new/delete). An invaluable technique for dealing with that is RAII.
Inheritance/polymorphism work only through pointer or reference in C++.
There are many others, but these will probably get you first.
Virtual destructors.
Header files! You'll find yourself asking, "so why do I need to write method declarations twice every time?"
Pointers and Memory Allocation
...I'm a C# guy too and I'm still trying to wrap my head around proper memory practices in C/C++.
Here is a brief overview of Managed C++ here. An article about writing an Unmanaged wrapper using the Managed C++ here. There is another article here about mixing Unmanaged with Managed C++ code here.
Using Managed C++ would IMHO make it easier to use as a bridge to the C# world and vice versa.
Hope this helps,
Best regards,
Tom.
The biggest difference is C#'s reference semantics (for most types) vs. C++'s value semantics. This means that objects are copied far more often than they are in C#, so it's important to ensure that objects are copied correctly. This means implementing a copy constructor and operator= for any class that has a destructor.
Raw memory twiddling. Unions, memsets, and other direct memory writes. Anytime someone writes to memory as a sequence of bytes (as opposed to as objects), you lose much of the ability to reason about the code.
Linking
Linking with external libraries is not as forgiving as it is in .Net, $DEITY help you if you mix something compiled with different flavors of the same msvcrt (debug, multithread, unicode...)
Strings
And you'll have to deal with Unicode vs Ansi strings, these are not exactly the same.
Have fun :)
The following isn't meant to dissuade in any way :D
C++ is a minefield of Gotcha's, it's relatively tame if you don't use templates and the STL -- and just use object orientation, but even then is a monster. In that case object based programming (rather than object-oriented programming) makes it even tamer -- often this form of C++ is enforced in certain projects (i.e., don't use any features that have even a chance of being naively used).
However you should learn all those things, as its a very powerful language if you do manage to traverse the minefield.If you want to learn about gotcha's you better get the books from Herb Sutter, Scott Myers, and Bjarne Stroustrup. Also Systematically going over the C++ FAQ Lite will help you to realize that it indeed does require 10 or so books to turn into a good C++ programmer.

Immutability of structs [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are mutable structs evil?
I read it in lots of places including here that it's better to make structs as immutable.
What's the reason behind this? I see lots of Microsoft-created structs that are mutable, like the ones in xna. Probably there are many more in the BCL.
What are the pros and cons of not following this guideline?
Structs should represent values. Values do not change. The number 12 is eternal.
However, consider:
Foo foo = new Foo(); // a mutable struct
foo.Bar = 27;
Foo foo2 = foo;
foo2.Bar = 55;
Now foo.Bar and foo2.Bar is different, which is often unexpected. Especially in the scenarios like properties (fortunately the compiler detect this). But also collections etc; how do you ever mutate them sensibly?
Data loss is far too easy with mutable structs.
The big con is that things don't behave how you expect them to - particularly if the mutability comes from a mixture of the direct value and a reference type within it.
To be honest, I can't remember off the top of my head all the weird problems I've seen people come up with in newsgroups when they've used mutable structs - but those reasons certainly exist. Mutable structs cause problems. Stay away.
EDIT: I've just found an email I wrote a while ago on this topic. It elaborates just a little bit:
It's philosophically wrong: a struct should represent some sort of fundamental value. Those are basically immutable. You don't get to change the number 5. You can change a variable's value from 5 to 6, but you don't logically make a change to the value itself.
It's practically a problem: it creates lots of weird situations. It's particularly bad if it's mutable via an interface. Then you can start changing boxed values. Ick. I've seen a lot of newsgroup posts which are due to people trying to use mutable structs and running into issues. I saw a very strange LINQ example which was failing because List<T>.Enumerator is a struct, for example.
I use mutable structs often in my (performance critical) project - and I don't run into problems, because I understand the implications of copy semantics. As far as I can tell, the main reason people advocate immutable structs is so that people who don't understand the implications can't get themselves in trouble.
That's not such a terrible thing - but we're in danger now of it becoming "the gospel truth", when in fact there are times where it is legitimately the best option to make a struct mutable. Like with all things, there are exceptions to the rule.
There is nothing cheaper to manipulate than a mutable struct, which is why you often see it in high performance code like the graphics processing routines.
Unfortunately mutable structs don't play well with objects and properties, it is way too easy to modify a copy of a stuct instead of the struct itself. Thus they aren't appropriate for most of your code.
P.S. To avoid the cost of copying mutable structs, they are usually stored and passed in arrays.
The technical reason is that mutable structs appear to be able to do things that they don't actually do. Since the design-time semantics are the same as reference types, this becomes confusing for developers. This code:
public void DoSomething(MySomething something)
{
something.Property = 10;
}
Behaves quite differently depending on if MySomething is a struct or a class. To me, this is a compelling, but not the most compelling reason. If you look at DDD's Value Object, you can see the connection to how structs should be treated. A value object in DDD can be best represented as a value type in .Net (and therefore a struct). Because it has no identity, it can't change.
Think of this in terms of something like your address. You can "change" your address, but the address itself hasn't changed. In fact, you have a new address assigned to you. Conceptually, this works, because if you actually changed your address, your roommates would have to move too.
You have asked for the pros and cons of not following the guideline that structs should be immutable.
Cons: The cons are well covered in existing answers, and most problems described are due to the same cause - unexpected behaviour due to structs' value semantics.
Pros: The main pro of using mutable structs can be performance. Obviously, this advice comes with all the usual caveats about optimisations: make sure that part of your code needs to be optimised and make sure any changes do actually optimise your code's performance via profiling.
For an excellent article discussing when you might want to use mutable structs, see Rico Mariani's Performance Quiz on Value-Based Programming (or more specifically, the answers).
A struct should generally represent a single unity of some kind. As such it doesn't make much sense to change one of the properties of the value, it makes more sense to create a completely new value if you want a value that is different somehow.
The semantics gets simpler when using immutable structs, and you avoid pitfalls like this:
// a struct
struct Interval {
int From { get; set; }
int To { get; set; }
}
// create a list of structs
List<Interval> intervals = new List<Interval>();
// add a struct to the list
intervals.Add(new Interval());
// try to set the values of the struct
intervals[0].From = 10;
intervals[0].To = 20;
The result is that the struct in the list is not changed at all. The expression Interval[0] copies the value of the struct from the list, then you change the property of the temporary value, but the value is never put back in the list.
Edit: Changed the example to use a list instead of an array.
When you copy structs around you copy their contents, so if you modify a copied version the "original" will not be updated.
This is a source for errors, since even if you know that you fall into the trap of copying a struct (just by passing it to a method) and modifying the copy.
Just happened to me again last week, kept me an hour searching for a bug.
Keeping structs immutable prevents that ...
Other than that you need to make sure you have a really good reason to use structs in the first place - "optimization" or "I want something that allocates quickly on the stack" does not count as an answer. Marshaling or things where you depend on layout - ok, but you should not typically keep those structs around for very long, they are values not objects.
the reason you should make structs immutable is that they're ValueTypes, meaning that they are copied every time you pass them to a method.
So if, for example, you had a property that returned a struct, modifying the value of a field on that struct would be worthless, because the getter would return a copy of the struct, rather than a reference to the struct. I've seen this done in code, and it's often difficult to catch.
If you design your structs for immutability, you help the programmer avoid these mistakes.

Categories