I've researched a bit and it seems that the common wisdom says that structs should be under 16 bytes because otherwise they incur a performance penalty for copying. With C#7 and ref return it became quite easy to completely avoid copying structs altogether. I assume that as the struct size gets smaller, passing by ref has more overhead that just copying the value.
Is there a rule of thumb about when passing structs by value becomes faster than by ref? What factors affect this? (Struct size, process bitness, etc.)
More context
I'm working on a game with the vast majority of data represented as contiguous arrays of structs for maximum cache-friendliness. As you might imagine, passing structs around is quite common in such a scenario. I'm aware that profiling is the only real way of determining the performance implications of something. However, I'd like to understand the theoretical concepts behind it and hopefully write code with that understanding in mind and profile only the edge cases.
Also, please note that I'm not asking about best practices or the sanity of passing everything by ref. I'm aware of "best practices" and implications and I deliberately choose not to follow them.
Addressing the "duplicate" tag
Performance of pass by value vs. pass by reference in C# .NET - This question discusses passing a reference type by ref which is completely different to what I'm asking.
In .Net, when if ever should I pass structs by reference for performance reasons? - The second question touches the subject a bit, but it's about a specific size of the struct.
To answer the questions from Eric Lippert's article:
Do you really need to answer that question? Yes I do. Because it'll affect how I write a lot of code.
Is that really the bottleneck? Probably not. But I'd still like to know since that's the data access pattern for 99% of the program. In my mind this is similar to choosing the correct data structure.
Is the difference relevant? It is. Passing large structs by ref is faster. I'm just trying to understand the limits of this.
What is this “faster” you speak of? As in giving less work to the CPU for the same task.
Are you looking at the big picture? Yes. As previously stated, it affects how I write the whole thing.
I know I could measure a lot of different combinations. And what does that tell me? That X is faster thatn Y on my combination of [.NET Version, process bitness, OS, CPU]. What about Linux? What about Android? What about iOS? Should I benchmark all permutations on all possible hardware/software combinations?
I don't think that's a viable strategy. Therefore I ask here where hopefully someone who knows a lot about CLR/JIT/ASM/CPU can tell me how that works so I can make informed decisions when writing code.
The answer I'm looking for is similar to the aforementioned 16 byte guideline for struct sizes with the explanation why.
generally, passing by reference should be faster.
when you pass a struct by reference, you are only passing a pointer to the struct, which is only a 32/64 bit integer.
when you pass a struct by value, you need to copy the entire struct and then pass a pointer to the new copy.
unless the struct is very small, for example, an int, passing by reference is faster.
also, passing by value would increase the number of calls to the os for memory allocation and de-allocation, these calls are time-consuming as the os has to check a registry for available space.
If you pass around structs by reference then they can be of any size. You are still dealing with a 8 (x64 assumed) byte pointer. For highest performance you need a CPU cache friendly design which is is called Data Driven Design.
Games often use a special Data Driven Design called Entity Component System. See the book Pro .NET Memory Management by Konrad Kokosa Chapter 14.
The basic idea is that you can update your game entities which are e.g. Movable, Car, Plane, ... share common properties like a position which is for all entities stored in a contigous array. If you need to increment the position of 1K entities you just need to lookup the array index of the position array of all entities and update them there. This provides the best possible data locality. If all would be stored in classes the CPU prefetcher would be lost by the many this pointers for each class instance.
See this Intel post about some reference architecture: https://software.intel.com/en-us/articles/get-started-with-the-unity-entity-component-system-ecs-c-sharp-job-system-and-burst-compiler
There are plenty of Entity Component Systems out there but so far I have seen none using ref structs as their main working data structure. The reason is that all popular ones are existing much longer than C# 7.2 where ref structs were introduced.
I finally found the answer. The breaking point is System.IntPtr.Size. In Microsoft's own words from Write safe and efficient C# code:
Add the in modifier to pass an argument by reference and declare your design intent to pass arguments by reference to avoid unnecessary copying. You don't intend to modify the object used as that argument.
This practice often improves performance for readonly value types that are larger than IntPtr.Size. For simple types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal and bool, and enum types), any potential performance gains are minimal. In fact, performance may degrade by using pass-by-reference for types smaller than IntPtr.Size.
I'm kinda new to C++ (coming from C#).
I'd like to pass an array of data to a function (as a pointer).
void someFunc(byte *data)
{
// add this data to a hashmap.
Hashtable.put(key, data)
}
This data will be added into a hashmap (some key-value based object).
In C#, i could just add the passed reference to a dictionary and be done with it.
Can the same be done in C++ ? or must i create a COPY of the data, and only add that to the data structure for storing it ?
I have seen this pattern in some code examples, but i am not 100% sure why it is needed, or whether it can be avoided at certain times.
Not sure where your key is coming from... but the std::map and the std::unordered_map are probably what you are looking for.
Now the underlying data structure of the std::map is a binary tree, while the std::unordered_map is a hash.
Furthermore, the std::unordered_map is an addition in the C++ 11 standard.
It all depends how you pass the data and how it is created. If the data is created on the heap(by using new) you can just put the pointer or reference you have to it in your table. On the other hand if the function takes the argument by value you will need to make a copy at some point, because if you store the address of a temp bad things will happen :).
As for what data structure to use, and how they work, I've found one of the best references is cppreference
Heap allocation should be reserved for special cases. stack allocation is faster and easier to manage, you should read up on RAII(very important). As for other reading try finding out stuff on dynamic vs. automatic memory allocation.
Just found this read specifically saying C# to C++ figured it'd be perfect for you, good luck c++ can be one of the more difficult languages to learn so don't assume anything will work the same as it does in C#. MSDN has a nice C# vs. C++ thing yet
I read that System.Drawing.Point is a value type. I do not understand. Why?
There are rules that microsoft tries to follow about this, they explain them very well in the MSDN, see Choosing Between Classes and Structures (The book is even better as it had lot of interesting comments)
Even if Point isn't a so good sample of this :
Struct should logically represents a single value (In this case a position, even if it have 2 components, but Complex numbers could also be separated in 2 parts and they are prime candidates for being structs)
Struct should have an instance size smaller than 16 bytes. (Ok, 2x4=8)
Struct should not have to be boxed frequently. (Ok this one is right)
BUT, Struct should be immutable (Here is the part where they don't follow their own rules, but i guess that micro-optimization gained over the rules, that anyway were written later)
As i said i guess that the fact that they haven't respected the "immutable" part is both because there weren't rules when System.Drawing was written and for speed as graphic operations could be quite sensitive to this.
I don't know if they were right or not to do it, maybe they measured some common algorithms and found that they lost too much performance in allocating temporary object and copying them over. Anyway such optimizations should only be done after carefully measuring real-world usage of the class/struc.
It's a Structure. Just like DateTime. And structures are value-types.
The reason for this is almost certainly that the System.Drawing.Point (and PointF) types are used for drawing through the .NET GDI(+) Wrappers, which requires marshalling. Marshalling value types (ie. structs) so that the Native libraries can use them is faster than marshalling heap allocated objects (ie. Classes).
From the MSDN (Performance Considerations for Run-Time Technologies in the .NET Framework
):
One extremely important thing to note is that ValueTypes require no marshalling in interop scenarios. Since marshalling is one of the biggest performance hits when interoperating with native code, using ValueTypes as arguments to native functions is perhaps the single biggest performance tweak you can do.
Well, I don't specifically know Microsofts reasons, but it makes sense. It is a fixed-size structure containing a small amount of immutable data. I would rather have such a thing allocated on the stack, where it is easy to allocate and easy to free. Making it a class and putting it on the heap means it has to be managed by the GC, which creates a significant amount of overhead for such a trivial thing.
In C#, struct types are considered as value types, to allow for user-defined value types. It is the case for Drawing.Point.
A friend and I have written an encryption module and we want to port it to multiple languages so that it's not platform specific encryption. Originally written in C#, I've ported it into C++ and Java. C# and Java will both encrypt at about 40 MB/s, but C++ will only encrypt at about 20 MB/s. Why is C++ running this much slower? Is it because I'm using Visual C++?
What can I do to speed up my code? Is there a different compiler that will optimize C++ better?
I've already tried optimizing the code itself, such as using x >> 3 instead of x / 8 (integer division), or y & 63 instead of y % 64 and other techniques. How can I build the project differently so that it is more performant in C++ ?
EDIT:
I must admit that I have not looked into how the compiler optimizes code. I have classes that I will be taking here in College that are dedicated to learning about compilers and interpreters.
As for my code in C++, it's not very complicated. There are NO includes, there is "basic" math along with something we call "state jumping" to produce pseudo random results. The most complicated things we do are bitwise operations that actually do the encryption and unchecked multiplication during an initial hashing phase. There are dynamically allocated 2D arrays which stay alive through the lifetime of the Encryption object (and properly released in a destructor). There's only 180 lines in this. Ok, so my micro-optimizations aren't necessary, but I should believe that they aren't the problem, it's about time. To really drill the point in, here is the most complicated line of code in the program:
input[L + offset] ^= state[state[SIndex ^ 255] & 63];
I'm not moving arrays, or working with objects.
Syntactically the entire set of code runs perfect and it'll work seamlessly if I were to encrypt something with C# and decrypt it with C++, or Java, all 3 languages interact as you'd expect they would.
I don't necessarily expect C++ to run faster then C# or Java (which are within 1 MB/s of each other), but I'm sure there's a way to make C++ run just as fast, or at least faster then it is now. I admit I'm not a C++ expert, I'm certainly not as seasoned in it as many of you seem to be, but if I can cut and paste 99% of the code from C# to C++ and get it to work in 5 mins, then I'm a little put out that it takes twice as long to execute.
RE-EDIT:
I found an optimization in Visual Studio I forgot to set before. Now C++ is running 50% faster then C#. Thanks for all the tips, I've learned a lot about compilers in my research.
Without source code it's difficult to say anything about the performance of your encryption algorithm/program.
I reckon though that you made a "mistake" while porting it to C++, meaning that you used it in a inefficient way (e.g. lots of copying of objects happens). Maybe you also used VC 6, whereas VC 9 would/could produce much better code.
As for the "x >> 3" optimization... modern compilers do convert integer division to bitshifts by themselves. Needless to say that this optimization may not be the bottleneck of your program at all. You should profile it first to find out where you're spending most of your time :)
The question is extreamly broad. Something that's efficient in C# may not be efficient in C++ and vice-versa.
You're making micro-optimisations, but you need to examine the overall design of your solution to make sure that it makes sense in C++. It may be a good idea to re-design large parts of your solution so that it works better in C++.
As with all things performance related, profile the code first, then modify, then profile again. Repeat until you've got to an acceptable level of performance.
Things that are 'relatively' fast in C# may be extremely slow in C++.
You can write 'faster' code in C++, but you can also write much slower code. Especially debug builds may be extremely slow in C++. So look at the type of optimizations by your compiler.
Mostly when porting applications, C# programmers tend to use the 'create a million newed objects' approach, which really makes C++ programs slow. You would rewrite these algorithm to use pre-allocated arrays and run with tight loops over these.
With pre-allocated memory you leverage the strengths of C++ in using pointers to memory by casting these to the right pod structured data.
But it really depends on what you have written in your code.
So measure your code an see where the implementations burn the most cpu, and then structure your code to use the right algorithms.
Your timing results are definitely not what I'd expect with well-written C++ and well-written C#. You're almost certainly writing inefficient C++. (Either that, or you're not compiling with the same sort of options. Make sure you're testing the release build, and check the optimization options.
However, micro-optimizations, like you mention, are going to do effectively nothing to improve the performance. You're wasting your time doing things that the compiler will do for you.
Usually you start by looking at the algorithm, but in this case we know the algorithm isn't causing the performance issue. I'd advise using a profiler to see if you can find a big time sink, but it may not find anything different from in C# or Java.
I'd suggest looking at how C++ differs from Java and C#. One big thing is objects. In Java and C#, objects are represented in the same way as C++ pointers to objects, although it isn't obvious from the syntax.
If you're moving objects about in Java and C++, you're moving pointers in Java, which is quick, and objects in C++, which can be slow. Look for where you use medium or large objects. Are you putting them in container classes? Those classes move objects around. Change those to pointers (preferably smart pointers, like std::tr1::shared_ptr<>).
If you're not experienced in C++ (and an experienced and competent C++ programmer would be highly unlikely to be microoptimizing), try to find somebody who is. C++ is not a really simple language, having a lot more legacy baggage than Java or C#, and you could be missing quite a few things.
Free C++ profilers:
What's the best free C++ profiler for Windows?
"Porting" performance-critical code from one language to another is usually a bad idea. You tend not to use the target language (C++ in this case) to its full potential.
Some of the worst C++ code I've seen was ported from Java. There was "new" for almost everything - normal for Java, but a sure performance killer for C++.
You're usually better off not porting, but reimplementing the critical parts.
The main reason C#/Java programs do not translate well (assuming everything else is correct). Is that C#/Java developers have not grokked the concept of objects and references correctly. Note in C#/Java all objects are passed by (the equivalent of) a pointer.
Class Message
{
char buffer[10000];
}
Message Encrypt(Message message) // Here you are making a copy of message
{
for(int loop =0;loop < 10000;++loop)
{
plop(message.buffer[loop]);
}
return message; // Here you are making another copy of message
}
To re-write this in a (more) C++ style you should probably be using references:
Message& Encrypt(Message& message) // pass a reference to the message
{
...
return message; // return the same reference.
}
The second thing that C#/Java programers have a hard time with is the lack of Garbage collection. If you are not releasing any memory correctly, you could start running low on memory and the C++ version is thrashing. In C++ we generally allocate objects on the stack (ie no new). If the lifetime of the object is beyond the current scope of the method/function then we use new but we always wrap the returned variable in a smart pointer (so that it will be correctly deleted).
void myFunc()
{
Message m;
// read message into m
Encrypt(m);
}
void alternative()
{
boost::shared_pointer<Message> m(new Message);
EncryptUsingPointer(m);
}
Show your code. We can't tell you how to optimize your code if we don't know what it looks like.
You're absolutely wasting your time converting divisions by constants into shift operations. Those kinds of braindead transformations can be made even by the dumbest compiler.
Where you can gain performance is in optimizations that require information the compiler doesn't have. The compiler knows that division by a power of two is equivalent to a right-shift.
Apart from this, there is little reason to expect C++ to be faster. C++ is much more dependent on you writing good code. C# and Java will produce pretty efficient code almost no matter what you do. But in C++, just one or two missteps will cripple performance.
And honestly, if you expected C++ to be faster because it's "native" or "closer to the metal", you're about a decade too late. JIT'ed languages can be very efficient, and with one or two exceptions, there's no reason why they must be slower than a native language.
You might find these posts enlightening.
They show, in short, that yes, ultimately, C++ has the potential to be faster, but for the most part, unless you go to extremes to optimize your code, C# will be just as fast, or faster.
If you want your C++ code to compete with the C# version, then a few suggestions:
Enable optimizations (you've hopefully already done this)
Think carefully about how you do disk I/O (IOStremas isn't exactly an ideal library to use)
Profile your code to see what needs optimizing.
Understand your code. Study the assembler output, and see what can be done more efficiently.
Many common operations in C++ are surprisingly slow. Dynamic memory allocation is a prime example. It is almost free in C# or Java, but very costly in C++. Stack-allocation is your friend.
Understand your code's cache behavior. Is your data scattered all over the place? It shouldn't be a surprise then that your code is inefficient.
Totally of topic but...
I found some info on the encryption module on the homepage you link to from your profile http://www.coreyogburn.com/bigproject.html
(quote)
Put together by my buddy Karl Wessels and I, we believe we have quite a powerful new algorithm.
What separates our encryption from the many existing encryptions is that ours is both fast AND secure. Currently, it takes 5 seconds to encrypt 100 MB. It is estimated that it would take 4.25 * 10^143 years to decrypt it!
[...]
We're also looking into getting a copyright and eventual commercial release.
I don't want to discourage you, but getting encryption right is hard. Very hard.
I'm not saying it's impossible for a twenty year old webdeveloper to develop an encryption algorithm that outshines all existing algorithms, but it's extremely unlikely, and I'm very sceptic, I think most people would be.
Nobody who cares about encryption would use an algorithm that's unpublished. I'm not saying you have to open up your sourcecode, but the workings of the algorithm must be public, and scrutinized, if you want to be taken seriously...
There are areas where a language running on a VM outperforms C/C++, for example heap allocation of new objects. You can find more details here.
There is a somwhat old article in Doctor Dobbs Journal named Microbenchmarking C++, C#, and Java where you can see some actual benchmarks, and you will find that C# sometimes is faster than C++. One of the more extreme examples is the single hash map benchmark. .NET 1.1 is a clear winner at 126 and VC++ is far behind at 537.
Some people will not believe you if you claim that a language like C# can be faster than C++, but it actually can. However, using a profiler and the very high level of fine-grained control that C++ offers should enable you to rewrite your application to be very performant.
When serious about performance you might want to be serious about profiling.
Separately, the "string" object implementation used in C# Java and C++, is noticeably slower in C++.
There are some cases where VM based languages as C# or Java can be faster than a C++ version. At least if you don't put much work into optimization and have a good knowledge of what is going on in the background. One reason is that the VMs can optimize byte-code at runtime and figure out which parts of the program are used often and changes its optimization strategy. On the other hand an old fashioned compiler has to decide how to optimize the program on compile-time and may not find the best solution.
The C# JIT probably noticed at run-time that the CPU is capable of running some advanced instructions, and is compiling to something better than what the C++ was compiled.
You can probably (surely with enough efforts) outperform this by compiling using the most sophisticated instructions available to the designated C.P.U and using knowledge of the algorithm to tell the compiler to use SIMD instructions at specific stages.
But before any fancy changes to your code, make sure are you C++ compiling to your C.P.U, not something much more primitive (Pentium ?).
Edit:
If your C++ program does a lot of unwise allocations and deallocations this will also explain it.
In another thread, I pointed out that doing a direct translation from one language to another will almost always end up in the version in the new language running more poorly.
Different languages take different techniques.
Try the intel compiler. Its much better the VC or gcc. As for the original question, I would be skeptical. Try to avoid using any containers and minimize the memory allocations in the offending function.
[Joke]There is an error in line 13[/Joke]
Now, seriously, no one can answer the question without the source code.
But as a rule of the thumb, the fact that C++ is that much slower than managed one most likely points to the difference of memory management and object ownership issues.
For instance, if your algorithm is doing any dynamic memory allocations inside the processing loop, this will affect the performance. If you pass heavy structures by the value, this will affect the performance. If you do unnecessary copies of objects, this will affect the performance. Exception abuse will cause performance to go south. And still counting.
I know the cases when forgotten "&" after the parameter name resulted in weeks of profiling/debugging:
void DoSomething(const HeavyStructure param); // Heavy structure will be copied
void DoSomething(const HeavyStructure& param); // No copy here
So, check your code to find possible bottlenecks.
C++ is not a language where you must use classes. In my opinion its not logical to use OOP methodologies where it doesnt really help. For a encrypter / decrypter its best not use classes; use arrays, pointers, use as few functions / classes / files possible. Best encryption system consists of a single file containing few functions. After your function works nice you can wrap it into classes if you wish. Also check the release build. There is huge speed difference
Nothing is faster than good machine/assembly code, so my goal when writing C/C++ is to write my code in such a way that the compiler understands my intentions to generate good machine code. Inlining is my favorite way to do this.
First, here's an aside. Good machine code:
uses registers more often than memory
rarely branches (if/else, for, and while)
uses memory more often than functions calls
rarely dynamically allocates any more memory (from the heap) than it already has
If you have a small class with very little code, then implement its methods in the body of the class definition and declare it locally (on the stack) when you use it. If the class is simple enough, then the compiler will often only generate a few instructions to effect its behavior, without any function calls or memory allocation to slow things down, just as if you had written the code all verbose and non-object oriented. I usually have assembly output turned on (/FAs /Fa with Visual C++) so I can check the output.
It's nice to have a language that allows you to write high-level, encapsulated object-oriented code and still translate into simple, pure, lightning fast machine code.
Here's my 2 cents.
I wrote a BlowFish cipher in C (and C#). The C# was almost 'identical' to the C.
How I compiled (i cant remember the numbers now, so just recalled ratios):
C native: 50
C managed: 15
C#: 10
As you can see, the native compilation out performs any managed version. Why?
I am not 100% sure, but my C version compiled to very optimised assembly code, the assembler output almost looked the same as a hand written assembler one I found.
For me , the Pointer was one of the hardest concept in programming languages in C++. When I was learning C++, I spent tremendous amount of time learning it. However, Now I primarily work in projects that are entirely written in languages like C#, and VB.NET etc. As a matter fact, I have NOT touched C++ for almost 4 years. Even though, C# has pointer, but I have not encouter the situation where I must use pointer in C#. So my question is , what kinds of productivity can we obtain in C# by using pointer ? what are the situation where the uses of the pointer is must?
You're already using lots of pointers in C#, except that they don't look like pointers. Every time you do something with an instance of a class, that's a pointer. You're getting almost all the potential benefit already, without the hassle.
It is possible to use pointers more explicitly in C#, which is what most people mean by C# pointers, but I would think the occasions would be very rare. They may be useful to link to C libraries and the like, but other than that I don't see much use for them.
Personally, I've never had a need for using pointers in .NET, but if you're dealing with absolute performance critical code, you'd use pointers. If you look at the System.String class, you'll see that a lot of the methods that handle the string manipulation, use pointers. Also, when dealing with image processing, very often it's useful to use pointers. Now, one can definitely argue whether those sort of applications should be written in .NET in the first place (I think they should), but at least if you need to squeeze out that extra bit of speed, you can.
I use pointers in C# only in rare circumstances that mostly have to do with sending/receiving data, where you have to convert a byte array to a struct and vice-versa. Though even then, you don't have to deal with pointers directly typically.
In some cases, you can use pointers to improve performance, because with the Marshaller, sometimes you have to copy memory to access data, while with pointers, you can access it directly (think Bitmap.Lock()).
Personally, I've never needed to use a pointer in C#. If I need that kind of functionality, I write that code in C++/CLI, and call it from C#. If I need to pass pointers from C# to C++/CLI or vice-versa, I pass them around as an IntPtr and cast it to the type I need in C++/CLI.
In my opinion - if you're using pointers in C#, in 99% of cases, you're using the language wrong.
Edit:
The nice thing about C++/CLI is that you can mark individual classes for native-only compilation. I do a lot of image processing work which needs to happen very quickly; it uses a lot of pointer-based code. I generally have a managed C++/CLI object forward calls to a native C++ object where my processing takes place. I turn on optimizations for that native code and viola, I get a nice performance gain.
Granted, this only matters if the performance gain you get by executing native, optimized code can offset the overhead of managed to unmanaged transitions. In my case, it always does.