Why would one ever use the "in" parameter modifier in C#? - c#

So, I (think I) understand what the in parameter modifier does. But what it does appears to be quite redundant.
Usually, I'd think that the only reason to use a ref would be to modify the calling variable, which is explicitly forbidden by in. So passing by in reference seems logically equivalent to passing by value.
Is there some sort of performance advantage? It was my belief that on the back-end side of things, a ref parameter must at least copy the physical address of the variable, which should be the same size as any typical object reference.
So, then is the advantage just in larger structs, or is there some behind-the-scenes compiler optimization that makes it attractive elsewhere? If the latter, why shouldn't I make every parameter an in?

in was recently introduced to the C# language.
in is actually a ref readonly. Generally speaking, there is only one use case where in can be helpful: high performance apps dealing with lots of large readonly structs.
Assuming you have:
readonly struct VeryLarge
{
public readonly long Value1;
public readonly long Value2;
public long Compute() { }
// etc
}
and
void Process(in VeryLarge value) { }
In that case, the VeryLarge struct will be passed by-reference without creating of defensive copies when using this struct in the Process method (e.g. when calling value.Compute()), and the struct immutability is ensured by the compiler.
Note that passing a not-readonly struct with an in modifier will cause the compiler to create a defensive copy when calling struct's methods and accessing properties in the Process method above, which will negatively affect performance!
There is a really good MSDN blog entry which I recommend to carefully read.
If you would like to get some more historical background of in-introducing, you could read this discussion in the C# language's GitHub repository.
In general, most developers agree that introducing of in could be seen as a mistake. It's a rather exotic language feature and can only be useful in high-perf edge cases.

passing by in reference seems logically equivalent to passing by value.
Correct.
Is there some sort of performance advantage?
Yes.
It was my belief that on the back-end side of things, a ref parameter must at least copy the physical address of the variable, which should be the same size as any typical object reference.
There is not a requirement that a reference to an object and a reference to a variable both be the same size, and there is not a requirement that either is the size of a machine word, but yes, in practice both are 32 bits on 32 bit machines and 64 bits on 64 bit machines.
What you think the "physical address" has to do with it is unclear to me. On Windows we use virtual addresses, not physical addresses in user mode code. Under what possible circumstances would you imagine that a physical address is meaningful in a C# program, I am curious to know.
There is also not a requirement that a reference of any kind be implemented as the virtual address of the storage. References could be opaque handles into GC tables in a conforming implementation of the CLI specification.
is the advantage just in larger structs?
Decreasing the cost of passing larger structs is the motivating scenario for the feature.
Note that there is no guarantee that in makes any program actually faster, and it can make programs slower. All questions about performance must be answered by empirical research. There are very few optimizations that are always wins; this is not an "always win" optimization.
is there some behind-the-scenes compiler optimization that makes it attractive elsewhere?
The compiler and runtime are permitted to make any optimization they choose if doing so does not violate the rules of the C# specification. There is to my knowledge not such an optimization yet for in parameters, but that does not preclude such optimizations in the future.
why shouldn't I make every parameter an in?
Well, suppose you made an int parameter instead an in int parameter. What costs are imposed?
the call site now requires a variable rather than a value
the variable cannot be enregistered. The jitter's carefully-tuned register allocation scheme just got a wrench thrown into it.
the code at the call site is larger because it must take a ref to the variable and put that on the stack, whereas before it could simply push the value onto the call stack
larger code means that some short jump instructions may have now become long jump instructions, so again, the code is now larger. This has knock-on effects on all kinds of things. Caches get filled up sooner, the jitter has more work to do, the jitter may choose to not do certain optimizations on larger code sizes, and so on.
at the callee site, we've turned access to a value on the stack (or register) into an indirection into a pointer. Now, that pointer is highly likely to be in the cache, but still, we've now turned a one-instruction access to the value into a two-instruction access.
And so on.
Suppose it's a double and you change it to an in double. Again, now the variable cannot be enregistered into a high-performance floating point register. This not only has performance implications, it can also change program behaviour! C# is permitted to do float arithmetic in higher-than-64-bit precision and typically does so only if the floats can be enregistered.
This is not a free optimization. You have to measure its performance against the alternatives. Your best bet is to simply not make large structs in the first place, as the design guidelines suggest.

There is. When passing a struct, the in keyword allows an optimization where the compiler only needs to pass a pointer, without the risk of the method changing the content. The last is critical — it avoids a copy operation. On large structs this can make a world of difference.

This is done because of the functional programming approach. One of the major principle is that function should not have side effects, which means it should not change values of the parameters and should return some value. In C# there was no way to pass structs(and value type) without being copied only by reference which allows changing of the value. In swift there is a hacky algorithm which copies struct (their collections are structs BTW) as long as method starts changing its values. People who use swift not all aware of the copy stuff. This is nice c# feature since it's memory efficient and explicit. If you look at what's new you will see that more and more stuff is done around structs and arrays in stack. And in statement is just necessary for these features. There are limitations mentioned in the other answers, but is not that essential for understanding where .net is heading.

in it is readonly reference in c# 7.2
this means you do not pass entire object to function stack similar to ref case you pass only reference to structure
but attempt to change value of object gives compiler error.
And yes this will allow you to optimize code performance if you use big structures.

Related

When is it more efficient to pass structs by value and when by ref in C#?

I've researched a bit and it seems that the common wisdom says that structs should be under 16 bytes because otherwise they incur a performance penalty for copying. With C#7 and ref return it became quite easy to completely avoid copying structs altogether. I assume that as the struct size gets smaller, passing by ref has more overhead that just copying the value.
Is there a rule of thumb about when passing structs by value becomes faster than by ref? What factors affect this? (Struct size, process bitness, etc.)
More context
I'm working on a game with the vast majority of data represented as contiguous arrays of structs for maximum cache-friendliness. As you might imagine, passing structs around is quite common in such a scenario. I'm aware that profiling is the only real way of determining the performance implications of something. However, I'd like to understand the theoretical concepts behind it and hopefully write code with that understanding in mind and profile only the edge cases.
Also, please note that I'm not asking about best practices or the sanity of passing everything by ref. I'm aware of "best practices" and implications and I deliberately choose not to follow them.
Addressing the "duplicate" tag
Performance of pass by value vs. pass by reference in C# .NET - This question discusses passing a reference type by ref which is completely different to what I'm asking.
In .Net, when if ever should I pass structs by reference for performance reasons? - The second question touches the subject a bit, but it's about a specific size of the struct.
To answer the questions from Eric Lippert's article:
Do you really need to answer that question? Yes I do. Because it'll affect how I write a lot of code.
Is that really the bottleneck? Probably not. But I'd still like to know since that's the data access pattern for 99% of the program. In my mind this is similar to choosing the correct data structure.
Is the difference relevant? It is. Passing large structs by ref is faster. I'm just trying to understand the limits of this.
What is this “faster” you speak of? As in giving less work to the CPU for the same task.
Are you looking at the big picture? Yes. As previously stated, it affects how I write the whole thing.
I know I could measure a lot of different combinations. And what does that tell me? That X is faster thatn Y on my combination of [.NET Version, process bitness, OS, CPU]. What about Linux? What about Android? What about iOS? Should I benchmark all permutations on all possible hardware/software combinations?
I don't think that's a viable strategy. Therefore I ask here where hopefully someone who knows a lot about CLR/JIT/ASM/CPU can tell me how that works so I can make informed decisions when writing code.
The answer I'm looking for is similar to the aforementioned 16 byte guideline for struct sizes with the explanation why.
generally, passing by reference should be faster.
when you pass a struct by reference, you are only passing a pointer to the struct, which is only a 32/64 bit integer.
when you pass a struct by value, you need to copy the entire struct and then pass a pointer to the new copy.
unless the struct is very small, for example, an int, passing by reference is faster.
also, passing by value would increase the number of calls to the os for memory allocation and de-allocation, these calls are time-consuming as the os has to check a registry for available space.
If you pass around structs by reference then they can be of any size. You are still dealing with a 8 (x64 assumed) byte pointer. For highest performance you need a CPU cache friendly design which is is called Data Driven Design.
Games often use a special Data Driven Design called Entity Component System. See the book Pro .NET Memory Management by Konrad Kokosa Chapter 14.
The basic idea is that you can update your game entities which are e.g. Movable, Car, Plane, ... share common properties like a position which is for all entities stored in a contigous array. If you need to increment the position of 1K entities you just need to lookup the array index of the position array of all entities and update them there. This provides the best possible data locality. If all would be stored in classes the CPU prefetcher would be lost by the many this pointers for each class instance.
See this Intel post about some reference architecture: https://software.intel.com/en-us/articles/get-started-with-the-unity-entity-component-system-ecs-c-sharp-job-system-and-burst-compiler
There are plenty of Entity Component Systems out there but so far I have seen none using ref structs as their main working data structure. The reason is that all popular ones are existing much longer than C# 7.2 where ref structs were introduced.
I finally found the answer. The breaking point is System.IntPtr.Size. In Microsoft's own words from Write safe and efficient C# code:
Add the in modifier to pass an argument by reference and declare your design intent to pass arguments by reference to avoid unnecessary copying. You don't intend to modify the object used as that argument.
This practice often improves performance for readonly value types that are larger than IntPtr.Size. For simple types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal and bool, and enum types), any potential performance gains are minimal. In fact, performance may degrade by using pass-by-reference for types smaller than IntPtr.Size.

"Overhead" of Class vs Structure in C#?

I'm doing course 3354 (Implementing System Types and Interfaces in the .NET Framework 2.0) and it is said that for simple classes, with members variables and functions, it is better to use a struct than a class because of overhead.
I have never heard of such a thing, what is the validity of this claim?
I recommend to never use a struct unless you have a very specific use-case in mind and know exactly how the struct will benefit the system.
While C# structs do allow members, they work a good bit different then classes (can't be subtyped, no virtual dispatch, may live entirely on the stack) and the behavior changes depending upon lifting, etc. (Lifting is the process of promoting a value type to the heap -- surprise!)
So, to answer the question: I think one of the biggest misnomers in C# is using structs "for performance". The reason for this is 'overhead' can't be truly measured without seeing how it interacts with the rest of the system and the role, if anything of note, it plays. This requires profiling and can't be summed up with such a trivial statement as "less overhead".
There are some good cases for struct value types -- one example is a composite RGB value stored in an array for an image. This is because the RGB type is small, there can be very many in an image, value types can be packed well in arrays, and may help to keep better memory locality, etc.
You should in general not change a type that should be a class into a struct for performance reasons. Structs and classes have different semantics and are not exactly interchangeable. It is true that structs can sometimes be faster, but they can also sometimes be slower. They behave differently and you should recognize when to use one or the other.
MSDN has a page on choosing between classes and structures.
The summary is:
Do not define a structure unless the
type has all of the following
characteristics:
It logically represents a single value, similar to primitive types
(integer, double, and so on).
It has an instance size smaller than 16 bytes.
It is immutable.
It will not have to be boxed frequently.
There may occasionally be a case where a type really should be a class but you need it to be a struct for performance reasons. This is rare and you should make sure you know what you are doing before taking this route. It can be especially dangerous if you break the immutability guideline for structs. Mutable structs can behave in ways that most programmers don't expect.
When C# was in beta, I was treated to a demonstration of the difference between class and struct. The developers had built a Mandelbrot set viewer, which grinds a lot of complex numbers to accomplish its result. IN the first instance, they ran the code with complex numbers represented as a class with a Real and Imaginary field. Then, they changed the word class to struct and recompiled. The difference in performance was huge.
Not having to allocate heap objects nor having to garbage collect them, in this scenario, was what made the difference. Depending on how many objects you have, the difference may be more or less spectacular. Before you start tweaking, measure your performance and only then make a decision as to go with the value type.
You're going to use a class about 95% of the time. Value type semantics end up biting you in the ass more often than not down the road.
If however, you're going with a struct, make sure that it's immutable.
In C#, classes and structs are completely different beasts. The syntax is nearly identical for the two but the behavior is completely different (reference vs. value). An individual instance of a struct will use less memory than the equivalent class but once you start building collections or passing them around, the value type semantics will kill you in terms of performance (and possibly memory consumption depending on what you're doing) unless the data in the struct is very small. I don't know what the hard number is but we're talking about on the order of 8 bytes or less.
It can also hurt in terms of maintainability and readability because the two look so similar but behave so differently.
One rule of thumb you can follow is to ask yourself whether or not what you are trying to do is like System.Date If it's not, then you should really really question your decision to use structs.
struct is a value type and class is a reference one. this could be one of the reason.
For simple types where you will have many of them, don't require inheritance, and are a good candidate for being immutable, structs are a good choice. Examples of this are Point, Rectangle, Decimal, and DateTime.
If you will not have many of them, the overhead is irrelevant and it shouldn't factor into your decision. If you will ever need to derive from it (or make it a derived type), it has to be a class. If the item cannot be immutable, struct is not a good candidate because mutating one instance won't change any copies of it.
I believe the most commonly quoted break-even point is around 12 bytes of data in your struct. So for example, 3 integers. The difference between value type and reference type is much more fundamental though, and you should make your choice based on that for types with few members.
If something is big, then class :)
Structs are value types and classes are reference types. So, if you're going to pass around references, classes can be better performance-wise since you are copying an address rather than the whole structure. However, if you are going to instantiate and reference these objects a large number of times, structs will be much better-performing because they are allocated on the stack. So, structs can be better for numerous objects where performance is important or in small objects where value-type semantics are desirable. Structs also have no default constructor. They do inherit from Object, but otherwise they are inheritance-free.
There's no need to allocate/deallocate structs.

System.Drawing.Point is a value type. Why?

I read that System.Drawing.Point is a value type. I do not understand. Why?
There are rules that microsoft tries to follow about this, they explain them very well in the MSDN, see Choosing Between Classes and Structures (The book is even better as it had lot of interesting comments)
Even if Point isn't a so good sample of this :
Struct should logically represents a single value (In this case a position, even if it have 2 components, but Complex numbers could also be separated in 2 parts and they are prime candidates for being structs)
Struct should have an instance size smaller than 16 bytes. (Ok, 2x4=8)
Struct should not have to be boxed frequently. (Ok this one is right)
BUT, Struct should be immutable (Here is the part where they don't follow their own rules, but i guess that micro-optimization gained over the rules, that anyway were written later)
As i said i guess that the fact that they haven't respected the "immutable" part is both because there weren't rules when System.Drawing was written and for speed as graphic operations could be quite sensitive to this.
I don't know if they were right or not to do it, maybe they measured some common algorithms and found that they lost too much performance in allocating temporary object and copying them over. Anyway such optimizations should only be done after carefully measuring real-world usage of the class/struc.
It's a Structure. Just like DateTime. And structures are value-types.
The reason for this is almost certainly that the System.Drawing.Point (and PointF) types are used for drawing through the .NET GDI(+) Wrappers, which requires marshalling. Marshalling value types (ie. structs) so that the Native libraries can use them is faster than marshalling heap allocated objects (ie. Classes).
From the MSDN (Performance Considerations for Run-Time Technologies in the .NET Framework
):
One extremely important thing to note is that ValueTypes require no marshalling in interop scenarios. Since marshalling is one of the biggest performance hits when interoperating with native code, using ValueTypes as arguments to native functions is perhaps the single biggest performance tweak you can do.
Well, I don't specifically know Microsofts reasons, but it makes sense. It is a fixed-size structure containing a small amount of immutable data. I would rather have such a thing allocated on the stack, where it is easy to allocate and easy to free. Making it a class and putting it on the heap means it has to be managed by the GC, which creates a significant amount of overhead for such a trivial thing.
In C#, struct types are considered as value types, to allow for user-defined value types. It is the case for Drawing.Point.

Shoud I use LayoutKind.Auto for my structs if they don't perform in COM Interop?

By default structs in C# are implemented with [StructLayout( LayoutKind.Sequential )] for reasons basically stating that these type of objects are commonly used for COM Interop and their fields must stay in the order they were defined. Classes have LayoutKind.Auto defined.
My question is should I explicitly state my structs as [StructLayout( LayoutKind.Auto )] and would this give me any benefits over the default? I mean that if structs are initialized on stack, will it make any difference - i.e. the GC doesn't have to move them around? Also will it help when structs are initialized on the heap - i.e. are part of some class?
The only possible benefit I can think of is your struct taking up less memory. But if you have such a large struct in the first place you should probably refactor it into a class.
A potential problem is it you want to Marshall your struct into a byte[] using Marshal.PtrToStructure, how can you guarantee the order of the bytes will be as you expect?
Doing this just seems like you're introducing more possible problems than those you're solving... That being said if the order of the fields in never important to you, then do it, but bear in mind that the next person who comes alone might not be expecting it.
It may give you benefits, even though I don't suppose that it will do much. I usually stick to the defaults.
Basically, with an auto layout, the CLR can choose how to align data, therefore maybe making some space tradeoffs for speed (this will depend on the platform also, keeping things aligned can be more important in some than in others). However, since structs are also often used on the stack or as composite helper structures (think of KeyValuePair), sequential does usually make sense as default.

How does const correctness help write better programs?

This question is from a C# guy asking the C++ people. (I know a fair bit of C but only have some general knowledge of C++).
Allot of C++ developers that come to C# say they miss const correctness, which seems rather strange to me. In .Net in order to disallow changing of things you need to create immutable objects or objects with read only properties/fields, or to pass copies of objects (by default structs are copied).
Is C++ const correctness, a mechanism to create immutable/readonly objects, or is there more to it? If the purpose is to create immutable/readonly objects, the same thing can be done in an environment like .Net.
A whole section is devoted to Const Correctness in the FAQ. Enjoy!
const correctness in C++ is at heart simply a way of saying exactly what you mean:
void foo( const sometype & t ) {
...
}
says "I will not change the object referred to by t, and if I try to, please Mr. Compiler warn me about it".
It is not a security feature or a way of (reliably) creating read-only objects, because it is trivial to subvert using C-style casts.
I bet this is a matter of mindsets. I'm working myself with long-time C++ people and have noticed they seem to think in C++ (of course they do!). It will take time for them to start seeing multiple answers to the ones they've grown familiar to have the 'stock' answers like const correctness. Give them time, and try to educate them gently.
Having said that, I do like the C++ 'const' way of guarantees. It's mainly promising the user of an interface that the object, data behind the pointer or whatever won't be messed with. I would see this as the #1 value of const correctness. #2 value is what it can allow the compiler in terms of optimization.
The problem is, of course, if the whole sw stack hasn't been built with const in mind from the ground up.
I'm sure C# has its own set of paradigms to do the same. This shows why it's so important to "learn one new (computing or natural) language a year". Simply to exercise one's mind of the other ways to see the world, and solve its problems.
With const-correctness, you can expose a mutable object as read-only without making a copy of it and wasting precious memory (especially relevant for collections). I know, there is this CPU-and-memory-are-cheap philosophy among desktop developers, but it still matters in embedded world.
On top of that, if you add complexities of memory ownership in C++, it is almost always better not to copy non-trivial objects that contain or reference other objects, hence there is a method of exposing existing objects as read-only.
Excellent Scott Meyer's book Effective C++ has dedicated an item to this issue:
ITEM 3: Use const whenever possible.
It really enlightens you :-)
The real answer, even if it is really hard to grasp if you have not used it, is that you loose expressiveness. The language has concepts that you cannot possibly express in C#, and they have meaning, they are part of the design, but lost in the translation to code.
This is not an answer, but rather an example:
Consider that you have a container that is sorted on a field of the elements that are stored. Those objects are really big. You need to offer access to the data for readers (consider showing the information in the UI).
Now, in C#/Java, you can go in one of two ways: either you make a copy for the caller to use (guarantees that the data will not change, but inefficient) or you return a reference to your internally held object (just hoping the caller will not change your data through setters).
If the user gets the reference and changes through it the field that serves as index, then your container invariants are broken.
In C++ you can return a constant reference/pointer, and the compiler will disable calling setter methods (mutating methods) on the instance you return. You get both the security that the user will not change (*) and efficiency in the call.
The third not mentioned before option is making the object inmutable, but that disables changes everywhere. Perfectly controlled changes to the object will be disallowed, and the only possibility of change is creating a new element with the changes performed. That amounts to time and space.
C++ provides plenty of ways you can mess up your code. I see const correctness as a first, efficient way of putting constraints to the code, in order to control the "side" effects (unwanted changes).
Marking a parameter as const (usually reference to const object or pointer to const object), will ensure that the passed object can't be changed, so you have no "side" effect of that function/method.
Marking a method as const will guarantee that that method will not change the state of the object it works with. If it does, the compiler will generate an error.
Marking a const data member will ensure that the data member can only be initialized in the initialization list of the constructor and can't be changed.
Also, smart compilers can use the constness as hints for various performance optimizations.
Of course, you can override constness. Constness is compile time checking. You can't break the rules by default, but you can use casting (e.g. const_cast) to make a const object non-const so it can be changed.
Just enlisting the compiler's help when writing code would be enough for me to advocate const-correctness. But today there is an additional advantage: multi-threading code is generally easier to write when you know where can our objects change and where they cannot.
I think it's a pity C# doesn't support const correctness as C++ does. 95% of all parameters I pass to a function are constant values, the const feature guarantees that the data you pass by reference won't be modified after the call. The "const" keyword provides compiler-checked documentation, which is a great help in large programs.

Categories