XobotOS: Why does the C# binary tree benchmark use a struct?

XobotOS: Why does the C# binary tree benchmark use a struct? - c#

Curious about the reputed performance gains in xobotos, I checked out the binary tree benchmark code.
The Java version of the binary tree node is:
private static class TreeNode
{
private TreeNode left, right;
private int item;
}
The C# version is:
struct TreeNode
{
class Next
{
public TreeNode left, right;
}
private Next next;
private int item;
}
I'm wondering what the benefit of using a struct here is, since the Next and Previous pointers are still encapsulated in a class.
Well, there is one - leaf nodes are pure value types since they don't need left and right pointers. In a typical binary tree where half the nodes are leaves, that means a 50% reduction in the number of objects. Still, the performance gains listed seem far greater.
Question: Is there more to this?
Also, since I wouldn't have thought of defining tree nodes this way in C# (thanks Xamarin!) what other data structures can benefit from using structs in a non-obvious way? (Even though that's a bit off-topic and open ended.)

I just ran across this odd code and had the same question. If you change the code to match the Java version it will run just slightly slower. I believe most of the 'struct TreeNode' will get boxed and allocated anyway, except for the bottom row. However, each node results in 2 allocations: boxed TreeNode and class Next. The allocation savings quickly disappear. IMO, this is not an appropriate use of struct.

This groups two nodes into one allocation, ideally halving the number of total allocations.
IMO this makes the comparision pretty meaningless.

Structures can be allocated on stack instead of the heap (not in every case though), which means that they are de-allocated as soon as they go out of scope - the Garbage Collector does not get involved in that scenario. That can result in smaller memory pressure and less garbage collections. Also, stack is (mostly) contigous area of memory, so access to it has better locality, which can (again, possibly) improve cache hits ratio on the CPU level.
Microsoft guidelines on choosing between classes and structures:
structures should be small (16 bytes generally) and short lived
they should be immutable
If those are met, then using a structure over a class will result in a performance gain.

I dont think that using struct here makes any difference at all. Especially after looking at source code of TreeNode, where instances of TreeNode are always copied in constructor and recursive bottomUpTree call.

Related

How to Span<T> and stackalloc to create a temporary small list

I was reading a description of some code written in C that gains speed due to allocating temporary arrays on the stack instead of the heap for use in very hot loops. (It was described as being similar to SBO optimization). The object in question is similar to a List<T> in that it's just an array with some basic convenience functionality on top. It allocates a small section of memory to use, and if the list is expanded past the size of the array, it allocates a new array on the heap, copies the data, and updates the pointer.
I would like to do the same thing in C#, but I'm not sure how to accomplish it as I want to keep this in a safe context so I can't use a pointer to update the data reference if its expanded, and Span<int> doesn't have an implicit cast to int[]. Specifically:
stackalloc memory is released on method exit, so I'm not sure if there's a simpler way to use a struct like this than giving it a Span field and assigning it after creating within the method using it.
How do I neatly switch between using backing fields of different types (Span and int[]) without changing the public-facing interface?

I managed to come up with a solution, not sure if it's the best implementation, but it seems to work. I also have a couple of alternatives.
Note: This is useful for increasing speed only when you have a function that needs to create a temporary array and is called very frequently. The ability to switch to a heap allocated object is just a fallback in case you overrun the buffer.
Option 1 - Using Span and stackalloc
If you're building to .NET Core 2.1 or later, .NET Standard 2.1 or later, or can use NuGet to use the System.Memory package, the solution is really simple.
Instead of a class, use a ref struct (this is necessary to have a Span<T> field, and neither can leave the method where they're declared. If you need a long-lived class, then there's no reason to try to allocate on the stack since you'll just have to move it to the heap anyway.)
public ref struct SmallList
{
private Span<int> data;
private int count;
//...
}
Then add in all your list functionality. Add(), Remove(), etc. In Add or any functions that might expand the list, add a check to make sure you don't overrun the span.
if (count == data.Length)
{
int[] newArray = new int[data.Length * 2]; //double the capacity
Array.Copy(data.ToArray(), 0, new_array, 0, cap);
data = new_array; //Implicit cast! Easy peasy!
}
Span<T> can be used to work with stack allocated memory, but it can also point to heap allocated memory. So if you can't guarantee your list will always be small enough to fit in the stack, the snippet above gives you a nice fallback that shouldn't happen frequently enough to cause noticeable problems. If it is, either increase the initial stack allocation size (within reason, don't overflow!), or use another solution like an array pool.
Using the struct just requires an extra line and a constructor that takes a span to assign to the data field. Not sure if there's a way to do it all in one shot, but it's easy enough:
Span<int> span = stackalloc int[32];
SmallList list = new SmallList(span);
And if you need to use it in a nested function (which was part of my issue) you just pass it in as a parameter instead of having the nested function return a list.
void DoStuff(SmallList results) { /* do stuff */ }
DoStuff(list);
//use results...
Option 2: ArrayPool
The System.Memory package also includes the ArrayPool class, which lets you store a pool of small arrays that your class/struct could take out without bothering the garbage collector. This has comparable speed depending on the use case. It also has the benefit that it would work for classes that have to live beyond a single method. It's also fairly easy to write your own if you can't use System.Memory.
Option 3: Pointers
You can do something like this with pointers and other unsafe code, but the question was technically asking about safe code. I just like my lists to be thorough.
Option 4: Without System.Memory
If, like me, you're using Unity / Mono, you can't use System.Memory and related features until at least 2021. Which leaves you to roll your own solution. An array pool is fairly straightforward to implement, and does the job of avoiding garbage allocations. A stack allocated array is a bit more complicated.
Luckily, someone has already done it, specifically with Unity in mind. The page linked is quite long, but includes both sample code demonstrating the concept and a code generation tool that can make a SmallBuffer class specific to your exact use case. The basic idea is to just create a struct with individual variables that you index as if they were an array.
Update: I tried both these solutions and the array pool was slightly faster (and a lot easier) than the SmallBuffer in my case, so remember to profile!

In C and C++ languages the developer defines in which memory an object is going to be instantiated: stack or heap.
In C# you it is determined by the author of the data type.
You can achieve your goal using Span and pointers. https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=netcore-3.1.
But I would not recommend you to do that, because your code is not safe. Meaning that CLR gives you all the responsibility to manage it, at least clean the memory, when you do not need such object anymore. Usually the C# developers come to such tricks, when they want to optimise really big data collections, which allocates a lot of memory in the heap.
If it is still what you are looking for - than, probably, C# is not the best option to use.
Even more, if you have a big collection and somehow you find the way how to put it in stack memory - you can easily face StackOverflowException.

.NET small class with many instances optimization scenario?

I have millions of instances of class Data, I seek optimization advise.
Is there a way to optimize it in any way - save memory for example by serializing it somehow, although it will hurt the retrieval speed which is important too. Maybe turning the class to struct - but it seems that the class is pretty large for struct.
Queries for this objects can take hundreds-millions of these objects at a time. They sit in a list and queried by DateTime. The results are aggregated in different ways, many calculation can be applied.
[Serializable]
[DataContract]
public abstract class BaseData {}
[Serializable]
public class Data : BaseData {
public byte member1;
public int member2;
public long member3;
public double member4;
public DateTime member5;
}

Unfortunately, while you did specify that you want to "optimize", you did not specify what the exact problem is you mean to tackle. So I cannot really give you more than general advice.
Serialization will not help you. Your Data objects are already stored as bytes in memory. Nor will turning it into a struct help. The difference between a struct and a class lies in their addressing and referencing behaviour, not in their memory footprint.
The only way I can think of to reduce the memory footprint of a collection with "hundreds-millions" of these objects would be to serialize and compress the entire thing. But that is not feasible. You would always have to decompress the entire thing before accessing it, which would shoot your performance to hell AND actually almost double the memory consumption on access (compressed and decompressed data both lying in memory at that point).
The best general advice I can give you is not to try to optimize this scenario yourself, but use specialized software for that. By specialized software, I mean a (in-memory) database. One example of a database you can use in-memory, and for which you already have everything you need on-board in the .NET framework, is SQLite.

I assume, as you seem to imply, that you have a class with many members, have a large number of instances, and need to keep them all in memory at the same time to perform calculations.
I ran a few tests to see if you could actually get different sizes for the classes you described.
I used this simple method for finding the in-memory size of an object:
private static void MeasureMemory()
{
int size = 10000000;
object[] array = new object[size];
long before = GC.GetTotalMemory(true);
for (int i = 0; i < size; i++)
{
array[i] = new Data();
}
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Total bytes: " + diff);
Console.WriteLine("Bytes per object: " + diff / size);
}
It may be primitive, but I find that it works fine for situations like this.
As expected, almost nothing you can do to that class (turning it to a struct, removing the inheritance, or the method attributes) influences the memory being used by a single instance. As far as memory usage goes, they are all equivalent. However, do try to fiddle with your actual classes and run them through the given code.
The only way you could actually reduce the memory footprint of an instance would be to use smaller structures for keeping your data (int instead of long for example). If you have a large number of booleans, you could group them into a byte or integer, and have simple property wrappers to work with them (A boolean takes a byte of memory). These may be insignificant things in most situations, but for a hundred million objects, removing a boolean could make a difference of a hundred MB of memory. Also, be aware that the platform target you choose for your application can have an impact on the memory footprint of an object (x64 builds take up more memory then x86 ones).
Serializing the data is very unlikely to help. An in-memory database has it's upsides, especially if you are doing complex queries. However, it is unlikely to actually reduce the memory usage for your data. Unfortunately, there just aren't many ways to reduce the footprint of basic data types. At some point, you will just have to move to a file-based database.
However, here are some ideas. Please be aware that they are hacky, highly conditional, decrease the computation performance and will make the code harder to maintain.
It is often a case in large data structures that objects in different states will have only some properties filled, and the other will be set to null or a default value. If you can identify such groups of properties, perhaps you could move them to a sub-class, and have one reference that could be null instead of having several properties take up space. Then you only instantiate the sub-class once it is needed. You could write property wrappers that could hide this from the rest of the code. Have in mind that the worst case scenario here would have you keeping all the properties in memory, plus several object headers and pointers.
You could perhaps turn members that are likely to take a default value into binary representations, and then pack them into a byte array. You would know which byte positions represent which data member, and could write properties that could read them. Position the properties that are most likely to have a default value at the end of the byte array (a few longs that are often 0 for example). Then, when creating the object, adjust the byte array size to exclude the properties that have the default value, starting from the end of the list, until you hit the first member that has a non-default value. When the outside code requests a property, you can check if the byte array is large enough to hold that property, and if not, return the default value. This way, you could potentially save some space. Best case, you will have a null pointer to a byte array instead of several data members. Worst case, you will have full byte arrays taking as much space as the original data, plus some overhead for the array. The usefulness depends on the actual data, and assumes that you do relatively few writes, as the re-computation of the array will be expensive.
Hope any of this helps :)

Is List<T> really an undercover Array in C#?

I have been looking at .NET libraries using ILSpy and have come across List<T> class definition in System.Collections.Generic namespace. I see that the class uses methods like this one:
// System.Collections.Generic.List<T>
/// <summary>Removes all elements from the <see cref="T:System.Collections.Generic.List`1" />.</summary>
public void Clear()
{
if (this._size > 0)
{
Array.Clear(this._items, 0, this._size);
this._size = 0;
}
this._version++;
}
So, the Clear() method of the List<T> class actually uses Array.Clear method. I have seen many other List<T> methods that use Array stuff in the body.
Does this mean that List<T> is actually an undercover Array or List only uses some part of Array methods?
I know lists are type safe and don't require boxing/unboxing but this has confused me a bit.

The list class is not itself an array. In other words, it does not derive from an array. Instead it encapsulates an array that is used by the implementation to hold the list's member elements.
Since List<T> offers random access to its elements, and those elements are indexed 0..Count-1, using an array to store the elements is the obvious implementation.

This tends to surprise C++ programmers that know std::list. A linked list, covered in .NET as well with the LinkedList class. And has the same perf characteristics, O(1) for inserts and deletes.
You should however in general avoid it. Linked lists do not perform well on modern processors. Which greatly depend on the cpu caches to get reasonable performance with memory that's many times slower than the execution core. A simple array is by far the data structure that takes most advantage of the cache. Accessing an element gives very high odds that subsequent elements are present in the cache as well. That is not the case for a linked list, elements tend to be scattered throughout the address space, make a cache miss likely. They can be very expensive, as much as 200 cycles with the cpu doing nothing but waiting on the memory sub-system to supply the data.
But do keep the perf characteristics in mind, adding or removing an element that is not at the end of the List costs O(n), just like an array. And a large List can generate a lot of garbage as the array needs to be expanded, setting the Capacity property up front can help a lot to avoid that. More about that in this answer. And otherwise the exact same concerns for std::vector<>.

Yes, List<T> uses an array internally to store the items, although in most cases the array is actually larger than the number of elements in the collection -- it has some extra "padding" at the end so that you can add new items without it having to reallocate memory every time. It keeps track of the actual size of the collection with a separate field (you can see this._size in your generated code). When you add more elements than the current array has room for, it will automatically allocate a new larger array -- twice as big, I think -- and copy over all the existing elements.
If you're concerned about a List<T> using more memory than necessary, you can set the size of the array explicitly with the constructor override that accepts a capacity parameter, if you know the size in advance, or call the TrimExcess() method to make sure the array is (close to) to actual size of the collection.

Random access memory is an array, so in that sense all data structures from linked-lists to heaps and beyond, that rely on random-access to memory for their performance behaviour, are built on the array that is system memory. It is more a question of how many-levels of abstraction are in between.
Of course in a modern virtual memory machine, the random-access system memory is itself an abstraction built on a complicated virtual-memory model of multi-tier pipelined caches, non-cached RAM, and disk.

In memory representation of large data

Currently, I am working on a project where I need to bring GBs of data on to client machine to do some task and the task needs whole data as it do some analysis on the data and helps in decision making process.
so the question is, what are the best practices and suitable approach to manage that much amount of data into memory without hampering the performance of client machine and application.
note: at the time of application loading, we can spend time to bring data from database to client machine, that's totally acceptable in our case. but once the data is loaded into application at start up, performance is very important.

This is a little hard to answer without a problem statement, i.e. what problems you are currently facing, but the following is just some thoughts, based on some recent experiences we had in a similar scenario. It is, however, a lot of work to change to this type of model - so it also depends how much you can invest trying to "fix" it, and I can make no promise that "your problems" are the same as "our problems", if you see what I mean. So don't get cross if the following approach doesn't work for you!
Loading that much data into memory is always going to have some impact, however, I think I see what you are doing...
When loading that much data naively, you are going to have many (millions?) of objects and a similar-or-greater number of references. You're obviously going to want to be using x64, so the references will add up - but in terms of performance the biggesst problem is going to be garbage collection. You have a lot of objects that can never be collected, but the GC is going to know that you're using a ton of memory, and is going to try anyway periodically. This is something I looked at in more detail here, but the following graph shows the impact - in particular, those "spikes" are all GC killing performance:
For this scenario (a huge amount of data loaded, never released), we switched to using structs, i.e. loading the data into:
struct Foo {
private readonly int id;
private readonly double value;
public Foo(int id, double value) {
this.id = id;
this.value = value;
}
public int Id {get{return id;}}
public double Value {get{return value;}}
}
and stored those directly in arrays (not lists):
Foo[] foos = ...
the significance of that is that because some of these structs are quite big, we didn't want them copying themselves lots of times on the stack, but with an array you can do:
private void SomeMethod(ref Foo foo) {
if(foo.Value == ...) {blah blah blah}
}
// call ^^^
int index = 17;
SomeMethod(ref foos[index]);
Note that we've passed the object directly - it was never copied; foo.Value is actually looking directly inside the array. The tricky bit starts when you need relationships between objects. You can't store a reference here, as it is a struct, and you can't store that. What you can do, though, is store the index (into the array). For example:
struct Customer {
... more not shown
public int FooIndex { get { return fooIndex; } }
}
Not quite as convenient as customer.Foo, but the following works nicely:
Foo foo = foos[customer.FooIndex];
// or, when passing to a method, SomeMethod(ref foos[customer.FooIndex]);
Key points:
we're now using half the size for "references" (an int is 4 bytes; a reference on x64 is 8 bytes)
we don't have several-million object headers in memory
we don't have a huge object graph for GC to look at; only a small number of arrays that GC can look at incredibly quickly
but it is a little less convenient to work with, and needs some initial processing when loading
additional notes:
strings are a killer; if you have millions of strings, then that is problematic; at a minimum, if you have strings that are repeated, make sure you do some custom interning (not string.Intern, that would be bad) to ensure you only have one instance of each repeated value, rather than 800,000 strings with the same contents
if you have repeated data of finite length, rather than sub-lists/arrays, you might consider a fixed array; this requires unsafe code, but avoids another myriad of objects and references
As an additional footnote, with that volume of data, you should think very seriously about your serialization protocols, i.e. how you're sending the data down the wire. I would strongly suggest staying far away from things like XmlSerializer, DataContractSerializer or BinaryFormatter. If you want pointers on this subject, let me know.

Fastest way to iterate over a stack in c#

I feel that using GetEnumerator() and casting IEnumerator.Current is expensive. Any better suggestions?
I'm open to using a different data structure if it offers similiar capabilities with better performance.
After thought:
Would a generic stack be a better idea so that the cast isn't necessary?

Stack<T> (with foreach) would indeed save the cast, but actually boxing isn't all that bad in the grand scheme of things. If you have performance issues, I doubt this is the area where you can add much value. Use a profiler, and focus on real problems - otherwise this is premature.
Note that if you only want to read the data once (i.e. you are happy to consume the stack), then this may be quicker (avoids the overhead of an enumerator); YMMV.
Stack<T> stack = null;
while (stack.Count > 0)
{
T value = stack.Pop();
// process value
}

Have you done any benchmarks, or are they just gut feelings?
If you think that the majority of the processing time is spent looping through stacks you should benchmark it and make sure that that is the case. If it is, you have a few options.
Redesign the code so that the looping isn't necessary
Find a faster looping construct. (I would recommend generics even though it wouldn't matter that much. Again, do benchmarks).
EDIT:
Examples of looping that might not be necessary are when you try to do lookups in a list or match two lists or similar. If the looping takes a long time, see if it make sense to put the lists into binary trees or hash maps. There could be an initial cost of creating them, but if the code is redesigned you might get that back by having O(1) lookups later on.

If you need the functionality of a Stack (as apposed to a List, or some other colleciton type), then yes, use a generic stack. This will speed things up a bit as the compiler will skip the casting at runtime (because it's garunteed at compile time).
Stack<MyClass> stacky = new Stack<MyClass>();
foreach (MyClass item in stacky)
{
// this is as fast as you're going to get.
}

Yes, using a generic stack will spare the cast.

Enumerating over a generic IEnumerable<T> or IEnumerator<T> doesn't create a cast if the iterating variable is of type T, so yes using the generic is going to be faster in most cases, but generics have some very subtle issues, especially when used with value types.
Rico Mariani (Microsoft performance architect) has some posts detailing the differences and the underpinnings
Six Questions about Generics and Performance
Performance Quiz #7 -- Generics Improvements and Costs
Performance Quiz #7 -- Generics Improvements and Costs -- Solution

As far as speed is concerned there are multiple variables, depends on the context. For example, in a auto-memory-managed codebase like C#, you can get allocation spikes which can affect framerate in something like, say, a game. A nice optimization you can make for this instead of a foreach is an enumerator with a while loop:
var enumerator = stack.GetEnumerator();
while(enumerator.MoveNext ()) {
// do stuff with enumerator value using enumerator.Current
enumerator.Current = blah
}
As far as CPU benchmarks, this probably isn't any faster than a foreach, but foreach can have unintended allocation spikes, which can ultimately "slow down" the performance of your application.

An alternative to creating an enumerator is to use the ToArray method, and then iterate over the array. The stack iterator causes some slight overhead for checking whether the stack has been modified, whereas iteration over the array would be fast. However, there is of course the overhead of creating the array in the first place. As mats says, you should benchmark the alternatives.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XobotOS: Why does the C# binary tree benchmark use a struct? - c#

This groups two nodes into one allocation, ideally halving the number of total allocations. IMO this makes the comparision pretty meaningless.

I dont think that using struct here makes any difference at all. Especially after looking at source code of TreeNode, where instances of TreeNode are always copied in constructor and recursive bottomUpTree call.

Related

How to Span<T> and stackalloc to create a temporary small list

.NET small class with many instances optimization scenario?

Is List<T> really an undercover Array in C#?

In memory representation of large data

Fastest way to iterate over a stack in c#

Categories

Resources